Swift for New Contributors
As a developer, jumping into a mature codebase can be somewhat daunting. How is the code structured? What is the request flow? What’s the process for getting my changes contributed upstream? These are great questions, and I’ll answer them all (and more!) in this post.
What do I work on?
Swift is a mature codebase with an established developer community. Finding the right things to work on can be challenging. If you are looking to contribute, scratch your own itch. What is it that you are interested in working on, and what is it that you have seen that needs improving? You may be interested in improving our docs. You may be interested in optimizing how Swift’s data placemnet algorithm works. You may be interested in adding higher-level features to the storage engine. Solving the problems that you have–your real-world stuggles–is never an inappropriate thing to work on.
If you don’t have an immediate pain point but you would still like to contribute, take a look at the bugs and ideas that others have talked about. All OpenStack projects use Launchpad for bug reports and feature tracking. Swift bug tracker is at https://bugs.launchpad.net/swift. Swift’s blueprints–short descriptions of a small piece of work to do–are on Launchpad at https://blueprints.launchpad.net/swift.
Where do I get the code?
Now that you’ve figured out what you want to work on, you need to get a copy of the code. Swift’s code is hosted on GitHub. Clone the code, and start hacking! If you need an intro or refresher on the git version control system, you can find one here.
How is the code organized?
Swift’s code has a well-organized structure. Once you get a copy of the code, you will see that there are several files and directories in the top-level directory. The files include the
AUTHORS file (where Swift’s contributors are listed), the
CHANGELOG (the curated record of what is in each release), test runners, and varous other files.
Inside of the
bin directory, there are the executable scripts for each of the Swift processes and helper tools. Most of these files are extremely short and simply call out to a Swift code library. A Swift deployment will run the
*-replicator processes on the cluster’s machines. The other scripts are helper tools that can help you monitor and diagnose problems in your Swift cluster.
doc directory contains both Swift’s man pages and the auto-generated developer documentation. This developer documentation can be found as http://swift.openstack.org/ and is updated at every commit.
etc directory contains all of Swift’s sample config files. These files have all of Swift’s config options with the defaults listed. If an option is commented out in the sample config file, it is optional and the default is the commented-out value. If an option isn’t commented out, it is required and you should use the given recommended value or one that is more appropriate to your deployment. Swift uses pastedeploy config files. You can find documentation on this format here.
Skipping ahead just a bit, the
test directory contains all of Swift’s unit and functional tests. The unit tests test small isolated parts of the Swift codebase. The functional tests ensure that Swift as a whole system is still working. The probe tests are somewhat in between. The probe tests ensure that the various internal parts of Swift are properly coordinated. The probe tests are similar to functional tests for the internal components of Swift.
swift directory is where the Swift storage engine lives. The
proxy directory contains the proxy server process code and the various controllers for how the Swift API handles high-level features and coordination with the storage node processes.
swift directory, the
obj directories each have the code for their respective server processes and their respective auditors, replicators, and other consistency processes. Together, the code in these directories implement the processes that run on a storage node in a Swift cluster.
swift directory also has a
common directory that includes code shared between different parts of the rest of the codebase. This directory also includes the
middleware directory that has all of the included-by-default middleware that ships with Swift. These are things like auth integration, monitoring, rate limiting, and caching.
Now, as a review, there is a commonly used utility method used throughout Swift called
split_path. Where is this method implemented? If you answered, “in
swift/common/utils.py”, give yourself a gold star. If you are troubleshooting an issue with how Swift detects filesystem corruption, where would you look? If you answered, “in
/swift/obj/auditor.py”, you’re on a roll! Finally, if you wanted to add some more information to the logs for each request, where would you start editing? If you answered, “in
swift/common/middleware/proxy_logging.py”, move to the front of the class and pat yourself on the back. You know how the Swift codebase is organized.
What’s the data flow?
Simply knowing how high-level concepts are organized in the codebase is good, but knowing the actual request flow is even better. What is the entry point for a request, and where does it go?
Swift is written entirely in Python and uses Python’s standard WSGI model for implementing it’s REST API. This means that if you already know how WSGI works, you are well on your way to understanding how requests move through Swift.
On process startup, each server calls its respective
__init__ method. This sets up internal state from the config file and gets the server ready to handle requests.
When a request comes to the server, it flows through the WSGI “pipeline” through any middleware to the server’s
__call__ method. Once a response is generated, the response flows back through the pipeline and to the client.
As an example, an object
GET first will go to the proxy server’s
__call__ method. This will call
handle_request which then chooses the appropriate controller. Then the request is given to the object controller’s
GET method. That method will call
GETorHEAD which in turn creates a new HTTP request to send to the object server. In the object server process, a very similar path is taken. Once the object server returns a response to the proxy server, the proxy then ensures that is was sucessful (eg not a 404 or server error) and create a response to pass back up the pipeline and out to the client.
How do I get my patch included upstream?
Now that you understand how Swift’s codebase works and you’ve written a patch, it would be great to share that patch with others. Swift, as part of OpenStack, requires that you sign a CLA. This doesn’t require copyright assignment, but is required before you can submit patches. You can find an explanation of the whole process on the OpenStack wiki.
Once you’ve submitted your patch, it will go to the rest of the Swift developer community to review. Once two core developers have approved it, it will be merged into the master branch and included in the next Swift release.
The core developers will look for a few things when reviewing your code. First, and most importantly, does it work? Does the code do what is says it does without any errors? Second, the reviewers will see if anything in your implementation would prevent both large and small clusters from using it. Your code must be able to scale, and since Swift is deployed in production all over the world, your code must have a migration path. For example, you cannot change config defaults or on-disk formats without a viable migration path. Next, the reviewers will check to make sure your code provides tests and doesn’t break any existing tests. Finally, the reviewer will check your code to see that it matches Swift’s stylistic guidelines. Your code must pass pep8 v1.3.3.
I strongly recommend that you set up your own Swift-All-In-One (SAIO) so that you can easily develop and test your changes. This is what all of the reviewers will use to evaluate your code. You can find docs on how to set up your own SAIO on http://swift.openstack.org/.
Where do I go for help?
You should now have a good understanding of how Swift’s code is organized, the way data flows through Swift, and how to contribute your patches upstream. But you will still have questions. SwiftStack provides a ton of great docs and videos about how Swift works. You can find docs at http://docs.openstack.org and http://swift.openstack.org. If you’d like chat with other Swift contributors about how Swift works or new features you’d like to add, drop by our IRC channel. We are on freenode in #openstack-swift. I’m “notmyname” and online most days.
Swift is a great piece of code that reliably solves real-world data storage problems. This is due to the hard work and careful dedication of the entire Swift developer community. I’m looking forward to seeing your patch.