How to Upgrade an OpenStack Swift Cluster With No Downtime

OpenStack Swift deployers can upgrade from one version of Swift to the next with zero downtime for end users. This has been supported since the initial release of OpenStack Swift back in 2010.

An HA Swift Cluster

Swift has a modular design that allows you to match your cluster exactly to your use case. Client requests go through Swift’s proxy servers to Swift’s storage nodes. The proxy abstraction means that the storage nodes are naturally HA in Swift—the proxy server detects and automatically works around storage node failure.

This leaves the proxy nodes. For this post, I’m assuming the Swift proxy nodes are behind a load balancer that is checking the /healthcheckSwift endpoint.

 

Time to Upgrade!

So it’s time to upgrade your production Swift cluster. You’ve got users actively connected to it, and you can’t have any downtime. What’s the process? There are three easy steps to upgrade any Swift cluster.

Step 0: Take a Look at the Swift Release Notes

Every Swift release includes curated release notes in the CHANGELOG file. This file includes major changes, including explicit references to any change in default config options or addition of new functionality that will affect existing clusters.

Before starting your upgrade, be sure to look at the CHANGELOG to see if there are any changes that may affect your upgrade process. Although very rare, we do sometimes need to add or change things that affect existing clusters. However, if new things are added, sane defaults are chosen. If existing defaults change, we provide a migration path.

Step 1: Upgrade a single Storage Node

First upgrade a single storage node as a canary node. Upgrade one server, monitor it for problems, and then move on if everything is ok.

To upgrade a Swift storage node, perform the following steps:

  • Stop all background Swift jobs with swift-init rest stop
  • Shutdown all Swift storage processes with swift-init {account|container|object} shutdown. This will do a graceful stop, allowing current requests to complete.
  • Upgrade all system packages and new Swift code
  • Update the Swift configs with any needed changes
  • If necessary (eg for kernel upgrades), reboot the server
  • Start the storage services with swift-init {account|container|object} start
  • Start the background processes with swift-init rest start

After you’ve performed these steps, monitor the Swift logs for any errors or other anomalous behavior. If everything looks ok, let’s move on!

Step 2: Upgrade all of the other Storage Nodes

Once you’ve upgraded one storage node successfully, you’re ready to upgrade all of the other ones. Going zone-by-zone, upgrade the storage nodes by performing the same tasks as above. Doing one zone at a time will allow you to take advantage of Swift’s ability to work around an entire zone of data disappearing during the upgrade. Since Swift places data across all of your zones, this means that you’ll still have both high availability and high durability for your data during this process.

If you have a smaller Swift cluster with just one zone, then you can still upgrade seamlessly. Go server-by-server instead of zone-by-zone.

Step 3: Upgrade your Proxy Servers

The Swift proxy servers support a /healthcheck endpoint. By monitoring this endpoint, a load balancer can know when a proxy is available and automatically add and remove it from the load balancer pool.

One nice feature of the /healthcheck endpoint is that a server admin can drop a file onto the local drive that will cause the /healthcheck endpoint to return with a 503 response code. You can find documentation of how to configure this feature in the sample proxy config file provided with Swift.

Like the storage nodes, first upgrade one proxy server, monitor it, and then upgrade the rest. Here are the steps to upgrade a proxy server.

  • Shutdown the proxy server with swift-init proxy shutdown. This will gracefully stop the process so that existing connections can finish.
  • Create the disable_path file to cause the /healthcheck endpoint to return errors to the load balancer. This will cause the load balancer to remove this proxy server from the load balancer pool and prevent new client requests from going to it.
  • Upgrade any system packages and Swift code
  • Update the proxy configs with any needed changes
  • If necessary, reboot the server
  • Start the proxy with swift-init proxy start
  • Remove the disable_path file so that the load balancer can add the proxy back into the pool.

Enjoy!

And that’s it! With those three steps, you can upgrade your existing, production Swift cluster with zero downtime for your end users.

Taking it a step further

The steps descibed above are available in the open-source Swift codebase. SwiftStack has automated this entire process down to a single “Upgrade” button-click for your entire cluster. Check out theSwiftStack documention for rolling OpenStack Swift upgrades or watch Joe’s video below: