Extending OpenStack Swift
As application storage needs grow, you quickly run into the limitations of traditional storage. Start with a hard drive; run out of space. Buy a bigger hard drive; run out of space again. Get a set of hard drives and use RAID; run out of space. Use a bigger RAID set, and now you have painfully long RAID rebuild times and you still run into scale limits. You’ve built a storage system that is a set of siloed storage domains, and you’ve got to figure out where to place your data, how to expand the storage capacity, and how to handle hardware failures. These are exactly the problems that OpenStack Swift was designed to solve.
With the growing popularity of the OpenStack project and Swift in particular, the question sometimes comes up if a different object storage system can be used to implement Swift. This question, however, is based on a fundamental misunderstanding of what Swift does. Swift is not an abstraction layer on top of an object storage system. OpenStack Swift provides clients a unified namespace that abstracts away the underlying storage volumes.
OpenStack Swift provides clients a unified namespace that abstracts away the underlying storage volumes.
With this simple concept, Swift gives operators seamless capacity management and provides storage for applications that just works, even when hardware fails. More importantly, it gives operators a simple storage service that can be tuned and extended to their exact needs–functionally, financially, and over the lifetime of their applications.
In this blog post, I’ll cover the two main ways Swift can be extended: the DiskFile abstraction and middleware.
Extending Swift with Storage Volumes
Storage volumes are the fundamental thing that Swift uses for data placement and failure handling. In most deployments, storage volumes are mapped one-to-one to hard drives. This makes a lot of sense when managing hardware failure domains, improving performance, and reducing costs. Still, there is a ton of functionality that can be implemented by the storage volume itself.
In the Swift code, objects are represented by a class called
DiskFile. Although there are more parts to a volume abstraction than just this class, we’ve taken to calling a particular volume abstraction a DiskFile. This volume abstraction is a fundamental part of storage policies in Swift. Swift’s out-of-the-box DiskFile simply assumes locally-mounted volumes with a standard filesystem on it. This implementation works very well for most use cases, especially when deployers are looking for simple storage at the lowest cost per gigabyte.
More functionality in storage volumes
Although Swift’s default DiskFile implementation works very well, there are some great examples and ideas for using different DiskFile implementations to provide different functionality to both deployers and end-users.
The main advances in the DiskFile abstraction in the last year are the result of Red Hat’s work to use OpenStack Swift as the object storage interface to GlusterFS storage volumes. Instead of reimplementing the Swift API, Red Hat is fully participating in the OpenStack Swift community to ensure that Gluster can take full advantage of the latest Swift code and features. This is absolutely the right way to pair Swift with another storage system: use the existing functionality in Swift and contribute back to community where additional functionality is missing.
This is absolutely the right way to pair Swift with another storage system: use the existing functionality in Swift and contribute back to community where additional functionality is missing.
Other functionality can be implemented in a Swift storage cluster by using the DiskFile abstraction. Data-at-rest encryption can be provided by simply using encrypted storage volumes, but by making a new
EncryptingDiskFile, advanced functionality, such as key management, could be included. Similarly, compression or de-duplication could be implemented by a DiskFile. More advanced functionality could be added, as ZeroVM is doing, by adding compute functionality at the storage location.
Better knowledge sharing between Swift and storage volumes
Additional functionality is not the only benefit possible with DiskFiles. When applications and the storage volume can share information, tremendous efficiency can be built into the system as a whole. For example, if the storage volume knows the logical structure of the data being written, then the volume itself can help in data integrity and efficient storage. This is demonstrated in Seagate’s Kinetic drive platform. By implementing a key/value API, the drive itself knows how data will be accessed and can make efficient decisions on how the data is written to disk. Also, since the drive now knows the logical keys associated with a piece of data, the drive itself can report to Swift actual objects that are in danger of being lost due to hardware issues.
No matter your preferred hardware, the DiskFile abstraction within Swift can be used to provide more efficiency in the system. Today, the default DiskFile simply assumes a storage volume is a POSIX-compliant filesystem that supports extended attributes. While we recommend XFS as a deployment choice, there is little in Swift that requires that particular filesystem. It is possible to write volume interfaces that are for a particular filesystem and thus use low-level semantics to improve Swift’s mean time to error detection and mean time to error recovery. One can also imagine DiskFile implementations that take advantage of the particular read and write characteristics of SSDs.
Extending Swift with Middleware
But DiskFile abstractions aren’t the only way to extend Swift’s functionality. Swift supports middleware that can intercept and modify requests and responses to the server. Many of Swift’s core features are implemented using middleware, including large objects, auth integration, and static website hosting.
Swift gives operators a simple storage service that can be tuned and extended to their exact needs–functionally, financially, and over the lifetime of their applications.
Middleware is one of the best ways to add functionality to Swift. The OpenStack community has written middleware to add an S3 API translation layer, integrate with the OpenStack Ceilometer project, add third-party search, and CDN integration. Others have created middleware to automatically generate thumbnails when an image is requested, generate profiling metrics, and facilitate transparent data migration from other storage systems.
If new extensions to the API need to be made, Swift’s middleware capability is the best place to start.
OpenStack Swift, the Extensible Object Storage System
OpenStack Swift powers some of the world’s largest storage clouds, and it’s flexible, modular design allows new functionality to be added easily. Volume abstractions and middleware both give deployers and storage vendors the ability to integrate with Swift and work in the community to build a storage system used by everyone, every day.