I used to work for a public cloud company called SoftLayer. Before S3’s API was the de-facto API standard that object storage services used several object storage APIs seemed like they could claim that crown. The company I worked for, SoftLayer, had recently come out with an Object Storage service based on the OpenStack Swift project. It came with its API, which is great. However, at the time, it was hard to get buy-in from customers to use under-supported APIs. They’d have to use unfamiliar tooling or even develop tooling themselves just to transfer files around… And if they currently had a product that didn’t support OpenStack Swift they may just be stuck. So I was charged with coming up with a solution for these customers.

After testing out a few different ideas I ended up deciding on the SFTP protocol. Object Storage doesn’t translate 1:1 with a “normal” filesystem but it was close enough. This might seem a strange decision in modern days where everyone has S3 integration but this was done as a way to integrate with older products and give customers a much more familiar feel with interacting with their data. I decided to call the project SwFTP.

graph LR
    cyberduck((Cyberduck)) -- SFTP --> swftp((SwFTP))
    filezilla((FileZilla)) -- SFTP --> swftp((SwFTP))
    client((SFTP/FTP Client)) -- SFTP or FTP --> swftp((SwFTP))
    swftp((SwFTP)) -- HTTPS --> swift((Swift))
  

I was all set. I needed to just… implement an SFTP server. Oof. That’s actually very challenging. You see, the SFTP protocol has lived for over a decade at that point so it has gone through several RFC drafts and it supports various extensions… And it’s built on top of SSH, which is also fairly complex to implement. So I needed to use something more proven. I needed a library that I could plug in OpenStack Swift integration into without needing to implement the wire protocol itself. Again, at the time this kind of ability was fairly rare. I was very experienced with Python and a bit with Go (nowadays I would totally write this project in Go, using this SFTP library and implementing the fs.Walker interface). That would have made my life so much simpler. But no, I was essentially stuck with Python’s Twisted. Twisted has a library called “Conche” which implements the SSH protocol and it allows you to hook into the SFTP subsystem, which is exactly what I needed. Twisted seemed to be the best option at the time, but it was (and still is) very hard to work with. The failure modes can be very complex. Plus, the SFTP protocol is fairly complex and SFTP clients will behave in vastly different ways. For instance, some clients, in order to maximize throughput, will concurrently send several batches of data at once when uploading a file without waiting for the acknowledgment to be received. We can’t behave similarly with an object storage API call so I needed to force the concurrent write requests to queue up properly until it was their turn to be sent down the wire to Swift. This, alone, is complex but I was also using an unfamiliar framework that has completely different sync primitives than I’ve used in the past… So I found this work to be very challenging… but in the end, it was very rewarding.

SwFTP was never the only project I was working on so my attention was split multiple ways. Despite that, after a year of working on this project, it was finally good enough to deploy to production as a supported service. Testing took a long time because, as I said, many SFTP clients behave very differently. I am happy about where I got the functional test framework since I was able to easily write code that would reproduce errors that we saw when testing, including some super complicated cases of race conditions. From this experience, I’ve learned some very important lessons about functional and manual testing.

All-in-all, SwFTP ended up being a success and a lot of data was transferred using this service. Thanks to my manager and support from others in the company I was able to perform all of this iteration and development as an open source project. There were no SoftLayer-specific implementation details included here so others could (and did) deploy the project for their own OpenStack Swift clusters.

By the way, if you’re having issues pronouncing “SwFTP” in your head then you aren’t alone. I used to call it something like “Swefteepee”.

References: