As part of the Swift technical track last week at the OpenStack design summit, we had several topics on the Swift API. Swift has a remarkably stable API. We’ve added to the API, but we haven’t removed anything or changed any existing behavior beyond some minor conformance-to-spec fixes. This means that clients written years ago still work even when talking to Swift clusters deployed just yesterday.
Although it is stable, the Swift API is not without some minor warts. There are a few inconsistencies in the API and a few awkward parts that would break clients if they changed. Cleaning up these parts of the API will help client developers write cleaner Swift applications and will allow end users to more easily use cross-cloud Swift clusters.
Figuring out what will be changed in the Swift API will be a long proccess, but there are a few important baseline things that need to happen before any changes to the API can be made. First, we need a formal definition of what the Swift API is. We have never had a formal API spec. Swift has always relied on the careful attention of its contributors to ensure that existing clients don’t break. A spec won’t lessen the need for careful attention by contributors and reviewers, but it will allow client developers to know exactly what they can expect from a particular deployment. A formal API spec also allows deployers to know what must be supported to ensure support for data migration between Swift clusters. As a side-benefit, formally defining the API will expose gaps in our current docs and help us keep our docs more up-to-date.
The second thing we must do as a community is define our API for discovering the supported Swift API. Users need to be able to determine what API a particular Swift cluster supports in order to know how to talk to it. There has been a lot of work on API discoverability in other OpenStack projects, so I hope that we can use some of their techniques and lessons-learned in Swift.
Once we have these two things, an API spec and API discoverability, we can start the discussions around what needs to change in the Swift API and go about implementing the changes in the code.
I expect that all of these questions will create quite a bit of discussion in the community. As a group, we need to get feedback from deployers (of all sizes), developers, and end users. Together, we’ll be able to make improvements and find the path that is best for everyone.
Since a Swift cluster is a set of cooperating processes running on many servers, it implies that there is an internal API too. This API is how the communication between the nodes works and how the storage nodes talk to the underlying storage volumes.
While this internal API isn’t nearly as formal or rigid as Swift’s external API, there are opportunities to improve it too. Parts of Swift’s code can be refactored to allow cleaner abstractions so that specific optimizations or alternatives can be implemented.
A while back, the concept of a Local File System (LFS) was proposed to Swift. Ultimately, the proposed patch was not merged, but the idea is a good one. The concept allows for filesystem-specific optimizations to be made. For example, an XFS module could optimize the way it walks over inodes or a ZFS module could take advantage of its ZFS-specific self-healing properties.
Other interested parties have started working on the concept for LFS rencently, specifically with the goal of better integrating Swift and GlusterFS. I’m hopeful that the patches will be successfully merged this time around, and I’m looking forward to the additional functionality the LFS feature will allow.
To move forward on improving both the internal and external APIs for Swift, we need community involvement for a few things:
- Formally defining the current Swift API
- Implementing API version discoverability into Swift
- Completion of the LFS patch for talking to storage volumes
- Refactoring the proxy code to abstract communication with storage servers
I’m looking for people in the Swift community to help completing these tasks. If you’re interested, drop by #openstack-swift on freenode and let’s talk!
On Monday, Swift developers got together at the OpenStack Havana Design Summit to talk about where Swift is going over the next six months. One of the things we discussed was improving Swift’s throughput under high concurrency. Now, Swift already has pretty good throughput even under concurrent load, but as networks get faster, the throughput demands of clients keep increasing.
In the design session, several alternatives were discussed, but the most-liked was the use of a thread pool for blocking disk I/O in the Swift object server. (Linux AIO was runner-up, but it’s a pretty quirky interface and hard to use correctly.)
All that discussion has begun to bear fruit. This morning, a patch to add thread pools was submitted to Gerrit for Swift core developers to consider.
At the OpenStack Summit last week, we had conversations with many users about how to best benchmark Swift. In the developer summit sessions, benchmarking and performance were recurring topics with lots of great input from both developers and users. We also heard from HP, Intel, and Seagate about how they conduct benchmarking of Swift and what they learned in the process. This post provides an overview of why benchmarking is important for any Swift cluster, how to approach it, and some of the key takeaways from the summit in this area. It also provides an overview of SwiftStack Bench (ssbench), the Swift benchmarking tool we recently open sourced.
Benchmarking OpenStack Swift
Depending on your goal, you may want a Realistic Benchmark or a Targeted Benchmark. Both approaches require benchmarking tools that scale to avoid any bottlenecks in the benchmarking code during load generation. Because of Swift’s fantastic horizontal scalability, avoiding bottlenecks in benchmarking code can be very challenging. Benchmarking Swift means generating tens of thousands of concurrent requests and utilizing many benchmarking servers to allow hundreds of gigabits per second of available client throughput. Both approaches to benchmarking also benefit from fine-grained collection of total request latency, time-to-first-byte latency, and Swift transaction IDs for every request. But they do have different goals, and that should inform load generation and results analysis.
Realistic Benchmarking, asks, “What happens when the cluster sees a particular client load?” or “How many clients, ops-per-second, or throughput can my cluster really support?” You are more interested in simulating a production workload than you are in isolating a particular action. This kind of benchmarking can benefit from simulating parametric mixed client workloads (proportion of object sizes, operation types, etc.) or replaying a workload based on some kind of capture or “trace” from another cluster.
With Targeted Benchmarking, you want to generate a very specific, controlled load on the cluster to identify problems and test potential improvements. Data collected during a synthetic workload will be less noisy than a more realistic, mixed workload. This is useful for testing the effectiveness of tweaks to networking, node hardware, tuning/configuration, and Swift code.
SwiftStack Bench (ssbench)
At SwiftStack, our first customer benchmarking requirements were realistic, so we wrote a scalable benchmarking tool we named SwiftStack Bench (ssbench). At its heart, ssbench either manages the run of a mixed-workload benchmark “scenario” or it generates a report from the results. The data collected for every request is quite rich and includes the start time, total duration, time-to-first-byte if it was a GET, and the Swift transaction ID for the request. Because there are many different ways to slice and dice the results, reporting has a lot of room for improvement. But the rich, raw results are saved so that previously-run benchmarks may benefit from future reporting improvements.
You can perform some targeted benchmarking with ssbench as well by simply having a very simple scenario. For example, you could target small-object PUTs by having a scenario with only small files and only PUT operations. Similarly, you could have only large files and only GET operations. Sam was able to demonstrate the benefit of per-disk I/O thread-pooling in the object-server with a GET workload using ssbench. We will soon extend the available operation types in ssbench to cover metadata POST operations as well. For folks with a metadata-intensive workload, these operations will enable investigations of the way Swift handles metadata when adjusting the size of XFS inodes, in addition to other metadata optimization.
The ssbench project is open-source and we look forward to developing it cooperatively with the Swift community. To that end, I led a discussion session at this month’s Design Summit to gather requirements and suggestions for benchmarking Swift. We had a lot of great feedback captured in the Etherpad from some current users of ssbench, other tool authors, and various Swift users.
The session generated many new feature requests for ssbench as well as other points/questions:
- Perhaps use Tsung for load-generation?
- Enable replaying a past load based on “trace” data from a live cluster.
- Generate a parametric benchmark scenario from live cluster “trace” data to develop more accurate loads.
- Peter Portante from RedHat mentioned successfully using Performance Co-Pilot to monitor a cluster during benchmarking.
- In his presentations, Mark Seger from HP demonstrated using collectl to monitor a cluster during benchmarking.
Here’s an example ssbench report. Note that I used small objects since I only had a single 12-core server and that the cluster in question had a down node during the benchmark. I also cut out the “Worst latency TX ID” column so it would look better in this blog post.
Medium test scenario Worker count: 10 Concurrency: 800 Ran 2013-04-21 18:06:16 UTC to 2013-04-21 18:06:39 UTC (22s) % Ops C R U D Size Range Size Name 77% % 26 60 7 7 1 kB - 16 kB tiny 23% % 26 60 7 7 100 kB - 200 kB small --------------------------------------------------------------------- 26 60 7 7 CRUD weighted average TOTAL Count: 99997 Average requests per second: 4475.8 min max avg std_dev 95%-ile First-byte latency: 0.007 - 1.509 0.053 ( 0.036) 0.082 (all obj sizes) Last-byte latency: 0.009 - 1.941 0.160 ( 0.127) 0.400 (all obj sizes) First-byte latency: 0.007 - 1.509 0.051 ( 0.034) 0.070 ( tiny objs) Last-byte latency: 0.009 - 1.559 0.133 ( 0.109) 0.312 ( tiny objs) First-byte latency: 0.010 - 1.494 0.061 ( 0.041) 0.100 ( small objs) Last-byte latency: 0.017 - 1.941 0.248 ( 0.140) 0.494 ( small objs) CREATE Count: 25889 Average requests per second: 1158.8 min max avg std_dev 95%-ile First-byte latency: N/A - N/A N/A ( N/A ) N/A (all obj sizes) Last-byte latency: 0.097 - 1.941 0.286 ( 0.128) 0.520 (all obj sizes) First-byte latency: N/A - N/A N/A ( N/A ) N/A ( tiny objs) Last-byte latency: 0.097 - 1.559 0.253 ( 0.110) 0.442 ( tiny objs) First-byte latency: N/A - N/A N/A ( N/A ) N/A ( small objs) Last-byte latency: 0.146 - 1.941 0.397 ( 0.121) 0.589 ( small objs) READ Count: 60191 Average requests per second: 2722.9 min max avg std_dev 95%-ile First-byte latency: 0.007 - 1.509 0.053 ( 0.036) 0.082 (all obj sizes) Last-byte latency: 0.009 - 1.613 0.096 ( 0.071) 0.231 (all obj sizes) First-byte latency: 0.007 - 1.509 0.051 ( 0.034) 0.070 ( tiny objs) Last-byte latency: 0.009 - 1.521 0.070 ( 0.039) 0.103 ( tiny objs) First-byte latency: 0.010 - 1.494 0.061 ( 0.041) 0.100 ( small objs) Last-byte latency: 0.017 - 1.613 0.183 ( 0.082) 0.296 ( small objs) UPDATE Count: 6915 Average requests per second: 310.5 min max avg std_dev 95%-ile First-byte latency: N/A - N/A N/A ( N/A ) N/A (all obj sizes) Last-byte latency: 0.088 - 1.516 0.252 ( 0.125) 0.483 (all obj sizes) First-byte latency: N/A - N/A N/A ( N/A ) N/A ( tiny objs) Last-byte latency: 0.088 - 1.516 0.218 ( 0.102) 0.394 ( tiny objs) First-byte latency: N/A - N/A N/A ( N/A ) N/A ( small objs) Last-byte latency: 0.121 - 1.409 0.367 ( 0.124) 0.568 ( small objs) DELETE Count: 7002 Average requests per second: 316.4 min max avg std_dev 95%-ile First-byte latency: N/A - N/A N/A ( N/A ) N/A (all obj sizes) Last-byte latency: 0.041 - 1.522 0.144 ( 0.094) 0.275 (all obj sizes) First-byte latency: N/A - N/A N/A ( N/A ) N/A ( tiny objs) Last-byte latency: 0.041 - 1.522 0.145 ( 0.093) 0.277 ( tiny objs) First-byte latency: N/A - N/A N/A ( N/A ) N/A ( small objs) Last-byte latency: 0.045 - 1.502 0.143 ( 0.094) 0.271 ( small objs)
What Does Swift Look Like To a Drive?
Tim Feldman from Seagate shared some interesting results of targeted benchmarking, from the hard-drives’ perspective. For the most part, the drives saw the expected load. Volume of reads was lower than predicted, but operating system buffer cache is the likely culprit, which is to be expected if the volume of benchmark data isn’t large enough to cause buffer cache thrashing.
Mr. Feldman pointed out that “runt writes” will be a problem for drives with native 4k sector sizes. A “runt write” results from writing to fewer than all 8 512-byte sections of a single 4k sector. When fewer than 8 512-byte sections of a 4k sector are written to, the drive must perform a read-merge-write operation instead of just a write. When drives used in Swift clusters move to a 4k sector size (and this will happen soon), we’ll need to make sure the filesystem and OS correctly operate using 4k sectors and not legacy 512-byte sectors.
A relatively small number of disk sectors got accessed many more times than the majority. This would make sense for filesystem and/or container metadata, but warrants further investigation or optimization.
What Can We Learn From Some Benchmarking?
Jiangang Duan from Intel detailed his team’s results from targeted
benchmarking with their open-source tool,
COSBench. They found that on
storage nodes, when buffer cache pressure caused filesystem metadata
to get evicted from memory, read performance suffered by 83%. The
average read operation was 34 KB instead of averaging 122 KB, and read
requests per second and throughput were both worse. Mr. Duan then showed
that setting the Linux kernel tunable,
vfs_cache_pressure, to a very
low number almost entirely mitigated this performance drop
by keeping inode data cached when under memory pressure.
Mr. Duan noted that their servers had four bonded 1-Gb/s NICs and seemed to utilize the bonding when transmitting data, but not when receiving it. He said this could use some further investigation and potential optimization.
Finally, a slow disk which isn’t actually dead can not only impact the average latency of incoming requests to that disk, but worsen the latency of all requests to the node by up to 25 percent. I don’t want to steal his thunder, but Sam will be writing a brief blog post about his talk, which addressed this very problem.
The Power of Fine-Grained Benchmark Metrics
Mark Seger from HP drove home the point that fine-grained tracking of each benchmark client request’s results is critical. Like ssbench, his closed-source benchmarking tool suite, “getput”, tracks response latencies and Swift transaction IDs for each request.
Being able to report on latencies over time allows you to spot odd things that happened briefly during a run. Average numbers for the whole run can’t give you that. Generating a latency histogram can show you the distribution of latencies, allowing you to see a long tail if you have one.
Mr. Seger noted that Swift’s scaling is excellent: with multiple clients, performance grows close to linearly. With small objects, benchmarking scales well, but with larger objects, CPU or bandwidth on the benchmark node becomes a bottleneck. This highlights my point earlier that your benchmarking tool needs to scale out so it doesn’t hit a bottleneck before your cluster does.
When comparing targeted benchmark results for GETs of 1k, 10k,
and 100k objects, Mr. Seger found that the requests per second for 10k were
noticeably lower. Further investigation revealed that only object sizes
7,888 and 22,469 bytes were affected. It turned out that
Nagel’s algorithm was
interfering because the maximum
segment size (MSS) over the physical NIC between the client and Pound
was much smaller than the MSS of the loopback device between Pound and
the Swift proxy-server. This arbitrarily added latency to requests in
a certain size range. Disabling Nagel’s algorithm with
TCP_NODELAY on internal sockets within Swift may therefore be a
A particular 6-second PUT had two of three writes return in under one second, but the third object-server held up the response to the client. Mr. Seger suggested optimizing client latency by returning success to the client as soon as the PUT quorum is satisfied.
Benchmarking is an important part of any Swift deployment. With many tools to choose from and best practices just emerging, it can be a daunting project. This post provided an overview of available tools, best practices and some lessons learned from the OpenStack summit. If you have questions or like to discuss benchmarking Swift, feel free to reach out to us here at SwiftStack.
Every year the Openstack conference gets bigger. Every conference is at a bigger venue with more people and more things going on. But for Openstack Developers hashing out the plan for the next set of work is the most fun part of the collaborative environment at the conference. And I’m having a blast.
This year the conference opened with a full day Swift design sessions on Monday. It was great to get down to brass tacks with Operators and Deployers using Swift as well as good number of other Active Technical Contributers. There is a TON of focus right now around some specific core topics, but then on Tuesday Swift almost overran the unconference sessions. With so many people at the conference using Swift there was just too much to fit into one packed room for one day. The unconference sessions tend to be where a bunch of smaller ideas can come together. And some of these ideas can have a big impact.
In particular David Hadas (another ATC for Swift who’s been contributing since last year and currently working at IBM) led a session at the end of the day on extending ACL’s and Metadata.
He introduced a couple of simple ideas to incrementally enhance Swift in small ways.
Swift supports metadata at every level. On objects, containers, and even directly on the account. Users can set metadata through the Swift API. Middleware and other core features will set metadata and then retrieve it later and change the behavior of the API when acting on the entity based on the metadata. A straightforward example is the X-Delete-At metadata that is central to the expiring object feature in Swift. Once an object has expired, even before it’s been reaped by the consistency processes, the object server will not serve an object after the X-Delete-At timestamp. This is metadata that is set by the user, but changes the way the storage system behaves.
Adding new capabilities often necessitates creating new metadata. But, today there’s a number of places in the Swift code that have to distinguish between metadata that the user can format to the own needs, or that must be validated as recognized and in the correct format to be consumed by system. As more features and metadata are added, it becomes harder to individually rationalize if the handling of the metadata is performing the proper validation in all cases.
Every time new metadata is added to support a feature you have to write validators. That’s not going to change. We have to protect the system from invalid input (garbage in garbage out). But, deciding where to apply the validation in Swift can require some relatively arcane knowledge about Swift internals. And still, most of the time you just piggy back on the processing for a “similar type” of metadata anyway!
By classifying metadata we can make it easier to add new metadata (and therefore new features that depend on metadata!)
To get us started David highlighted some high-level metadata classes:
- Storage System MD: Created by the storage system and consumed by the user (e.g. counters)
- System MD: Created by the user and consumed by the storage system (e.g. ACLs, Quata)
- User MD: Created by the user and consumed by the user (e.g. any regular user MD)
He gave a nod to the CDMI standards definition for influencing these classes, and got some good laughs poking fun that it might still be a good idea anyway. Ha! I totally agree, this is a good idea!
The work to do now is to identify and consolidate metadata validation and build a system that will simplify the introduction of new metadata. Working out the details will require identifying all of the metadata that Swift is currently supporting plus the known usecases where we know we want to extend and verifying that they fit into these groups. Then internally aligning the places where Swift is processing and validating metadata under these buckets.
Access Control Lists on Accounts
I think Swift’s container ACL’s do a great job of balancing simplicity and functionality. Under a container you can individually grant (or revoke) read, write or listing access based on individual users or groups identified by your auth system or the Referer (sic).
This is very awesome.
Whether your usecase is simply sharing some static content in your container with the world, or a more complex temporary granting of another authorized user the ability to upload data under your account. Swift ACL’s allow YOU to describe access to your data.
However, at the account level access is granted by the auth system. If you want a user to create containers in in a Swift account you would typically grant them the admin role for that account in your auth system.
This works well with most of the auth systems that were built with cloud systems like Swift in mind, certainly Keystone and Swauth.
But as Swift integrates with more businesses and existing auth systems it becomes apparent that it may not always be easy to update the structure in the pre-existing auth system for every new account in a highly scalable storage system that’s separating projects into thousands or even hundreds of thousands of Swift accounts!
However, in the approach outlined by David, we can add the ability to describe within Swift itself which pre-existing users or groups in the auth system have access. This puts the control of access to your data completely in the hands of account owner.
There’s still tons of issues to work through. Can I remove my own access to an account? Can I transfer complete ownership? As a service provider do I want to allow users to have public accounts? But I’m really excited about that work.
Elegance is Powerful
I think both of these ideas are taking existing concepts that are already in Swift and expanding them incrementally. But, personally, I’m blown away by the implications. Swift has always taken the approach of solving problems in the simplest way that solves the broadest usecases. I might even go as far to say that style of simplification is a tenant of elegance. And who doesn’t need a little more elegance in their software defined storage system?
On Monday, the Spring 2013 OpenStack summit is getting started in Portland, Oregon with a sold-out crowd. This year, there is a record attendance of more than 2,500 OpenStack users, developers, vendors and fans. Wow….what a change from the first summit in Austin, June 2010, which had a total of 150-ish attendees.
At this summit, a dozen SwiftStack’ers are attending and we will have a record number of sessions on OpenStack Swift - from the design sessions to an introduction on Swift to workshops on how to deploy Swift and SwiftStack and a panel on how to build a business on OpenStack, including Swift. To help you in your OpenStack summit planning, here is a schedule and summary of the sessions and workshops that the SwiftStack team will be participating in and presenting:
Join the OpenStack Swift core team, Swift developers and SwiftStack Director of Technology / Swift Project Technical Lead (PTL), John Dickinson, to discuss the ongoing development of Swift, including plans for the Havana release.
Swift extensions for real world (operator’s view) Monday, April 14, 9:50 am @ meeting room B116
Swift API Cleanup Monday, April 14, 11:00 am @ meeting room B116
Local File System Monday, April 14, 11:50 am @ meeting room B116
Swift with OpenStack what’s next Monday, April 14, 1:50 pm @ meeting room B116
Swift drive workloads Monday, April 14, 2:40 pm @ meeting room B116
Speeding up the object server Monday, April 14, 3:40 pm @ meeting room B116
Swift performance analysis Monday, April 14, 4:30 pm @ meeting room B116
Benchmarking Swift Monday, April 14, 5:30 pm @ meeting room B116
Tuesday, April 16, 3:40 pm - 4:20 pm @ meeting room A106
Joe Arnold, CEO of SwiftStack, will provide an an overview of Swift’s architecture and its components. It will also cover real- world use cases, illustrating how high-volume websites use Swift and how the technology enables storage infrastructure-as-a-service.
The OpenStack Swift introduction is aimed at attendees who want to understand the design goals of Swift and how they can best make use of this OpenStack component. It will be an informative introduction for those interested in running Swift or contributing to the Swift project. Learn more here.
Thursday, April 18, 1:30pm - 2:10pm @ meeting rooms C123 + C124
In this workshop, Joe Arnold, John Dickinson, Martin Lanner and Hugo Kuo of SwiftStack, will teach you how to deploy OpenStack Swift from the ground up. It will be a hands-on training where the audience will learn by doing rather than listening. Come with a laptop, or feel free to watch and learn.
You will be guided through a deployment and configuration of OpenStack Swift. We will walk you through the architecture of Swift while demonstrating a step- by-step installation from the ground up, including Swift’s architecture (The Ring, Zones, Partitions, Accounts & Containers), how to bootstrap a basic Swift installation, the guts of how OpenStack Swift works and Swift’s failure recovery mechanisms. Learn more here.
Thursday, April 18, 2:30 pm - 3:00 pm @ meeting rooms C123 + C124
Join John Dickinson, Joe Arnold, Martin Lanner and Hugo Kuo in an interactive workshop, which will cover automation and management of OpenStack Swift with SwiftStack. In this hands-on workshop, you will learn about the automation required to run OpenStack Swift in production, runtime stacks for load-balancing, ssl-termination and authentication, networking architecture for Swift, monitoring Swift-specific metrics, tuning a Swift cluster and best practices for cluster expansion and failure handling. Learn more here.
Wednesday, April 17, 1:50 pm - 2:30 pm @ meeting room A105
If you are interested in hearing a discussion on how to build a business on OpenStack, this is a the session to attend. Join Jonathan Bryce (OpenStack Executive Director), Ryan Floyd, (Storm Ventures Managing Director), Anders Tjernlund (COO and co-founder at SwiftStack) and other panelists in a discussion on lessons learned and best practices in building a business on OpenStack. While this is a more business oriented session, Swift is expected to be prominently featured in the discussion . Learn more here.
Overflow Unconference Sessions
The community submitted session proposals for 18 Swift technical talks, and since we were not able to schedule them all on Monday, we’ll be adding as many as we can to the Unconference track on Tuesday. These overflow sessions include talks on multi-cluster federation, archiving, RAID, and more. Keep an eye on the Unconference schedule, and join us for more technical discussions.
SwiftStack Gift for Contributors
OpenStack Swift is the result of over 100 contributors coming together to solve real-world problems. We’ve got a special gift for everyone who’s contributed to Swift. If you’ve in Swift’s authors file, stop by our booth and pick up your thank you gift.
Giveaways for Everyone
Finally, make sure to stop by our booth at the Summit. We have a giveaway there that you do not want to miss.
Looking forward to seeing everyone next week. And if you are not able to join us at Portland, we will make much of what we presented available here at swiftstack.com
OpenStack Grizzly was released today. As Swift’s Project Technical Lead, the most fun part of my job is to put together the release notes at the end of the OpenStack release cycle. Seeing what the community has come together to build, the new use cases that are enabled, and the improvements in existing features is tremendously exciting. I’m honored to be a part of it. I’d like to share with you a few of the key features that have been added to Swift over the last six months.
During the OpenStack Grizzly release cycle, Swift has released version 1.7.5, 1.7.6, and 1.8.0. The full notes for these releases is available in Swift’s changelog.
As always, deployers can upgrade to the latest version of Swift with no downtime on their existing clusters.
Key New Features
Global clusters building blocks
Allow the rings to have an adjustable replica count: Deployers can now adjust the replica count on existing clusters
Allow rings to have different replica counts: Deployers can choose different replica counts for account, container, and object rings
Added support for a region tier above zones: Deployers can group zones into regions.
Added timing-based sorting of object servers on read requests: This allows the fastest responding server to serve the most requests instead of a random choice of the replicas. This can be especially useful when a replicas are in different regions separated by a WAN.
Added support for large objects with static manifests: Static large object manifests allow Swift users to specifically designate the individual segments which will make up a large object. Full docs are on the OpenStack site.
Added support for CORS requests: CORS allows web application developers to get around same-origin restrictions in web browsers. With this feature, web developers can use a Swift cluster directly instead of needing to proxy content through a separate server.
Bulk requests: Users can now ask a Swift cluster to upload or delete many objects with just one request.
Added support for auto-extracting archive uploads: A client can upload an archive file (ie a .tar file) and the contents will be stored individually in the cluster
Added support for bulk deletes: A client can delete many objects with one delete request
Added user-managed container quotas
Added support for account-level quotas (managed by an auth reseller)
I’m excited about what’s been added to Swift and the growing community that has contributed to its development. I hope to see you all in Portland ten days from now at the OpenStack summit.
What is CORS?
Cross-Origin Resource Sharing, or CORS, is a draft standard that enables web applications to make client-side requests to resources on another domain.
http://evil.example.com/ will not be able to load your bank’s web content and masquerade as a legitimate site. However, the same-origin security policy has the side effect of limiting some web application developers, too. The same-origin policy prevents a web application hosted at
http://app.example.com from uploading images directly to
http://images.example.com. CORS is the standard way to tell a browser that both sites are trusted and requests between the two should be allowed.
CORS and Swift
OpenStack Swift allows users to set CORS headers on data stored in Swift. This gives application developers the flexibility to upload data to Swift or host web content directly from Swift without having to build and maintain a separate proxying layer to get around the same-origin security model.
CORS headers in Swift are implemented on a per-container basis. To use CORS headers on data in your Swift cluster, set the appropriate metadata headers on your containers. Setting this container metadata causes all requests for objects in that container to return with the CORS headers and respond appropriately to
The headers you can set on your containers are:
X-Container-Meta-Access-Control-Allow-Origin X-Container-Meta-Access-Control-Max-Age X-Container-Meta-Access-Control-Allow-Headers X-Container-Meta-Access-Control-Expose-Headers
These are standard container metadata headers, but when a CORS request is made to the container or to an object in the container, these metadata entries are set on the response.
A Simple Demo
Let’s play with the CORS functionality in Swift.
First, let’s create a container with the CORS headers:
curl -i -XPUT -H "X-Auth-Token: exampletoken" \ -H "X-Container-Meta-Access-Control-Allow-Origin: http://webapp.example.com" \ http://swift.example.com/v1/AUTH_example/c
You can, of course, set headers on existing containers with a POST request.
Next let’s create an object in that container:
curl -i -XPUT --data-binary 1234 -H "X-Auth-Token: exampletoken" \ -H "X-Container-Meta-Access-Control-Allow-Origin: http://webapp.example.com" \ http://swift.example.com/v1/AUTH_example/c/o
And now we can make CORS requests and see what happens. The first request is the CORS pre-flight request. The draft spec defines a successful response as having a 200 status code and anything else as a CORS pre-flight request failure.
curl -i -XOPTIONS -H "X-Auth-Token: exampletoken" \ -H "Origin: http://webapp.example.com" \ -H "Access-Control-Request-Method: POST" \ http://swift.example.com/v1/AUTH_example/c/o HTTP/1.1 200 OK Access-Control-Allow-Origin: http://webapp.example.com Access-Control-Allow-Methods: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE Access-Control-Allow-Headers: x-auth-token Allow: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE Content-Length: 0 X-Trans-Id: txcfd8e244793046fcacbc7df4200e53c3 Date: Sat, 02 Feb 2013 07:35:23 GMT
Since we got a successful response, we can make the actual request:
curl -i -XPOST -H "X-Auth-Token: exampletoken" \ -H "Content-Type: text/plain" \ -H "Origin: http://webapp.example.com" \ http://swift.example.com/v1/AUTH_example/c/o HTTP/1.1 202 Accepted Access-Control-Allow-Origin: http://webapp.example.com Content-Type: text/html; charset=UTF-8 Content-Length: 76 Access-Control-Expose-Headers: cache-control, content-language, content-type, expires, last-modified, pragma, etag, x-timestamp, x-trans-id X-Trans-Id: txa73ddbc322e84484b542c9f1d39ed9d1 Date: Sat, 02 Feb 2013 07:38:07 GMT <html><h1>Accepted</h1><p>The request is accepted for processing.</p></html>
PyCon 2013 just wrapped up, and we had a great time. This was my fifth PyCon to attend, but this year was my first time attending as a sponsor. I’m amazed by the variety of companies supporting the Python community and the range of ideas that the speakers and attendees bring. I didn’t get to attend as many sessions this year as I have in the past (as a sponsor, I had to pull booth duty in the expo hall), but the “hallway track” was great. Perhaps spurred on by Guido’s keynote, the hot topic of this conference for me seemed to be dealing with async IO. I hope to see many of the ideas discussed worked into Swift.
One of the highlights of the conference was the hackathon sprint we sponsored with Red Hat. This was also my first time to attend the PyCon sprints, and I can certainly say I hope it’s not my last. There’s a great energy among the attendees after the speaking part of the conference ends.
For our sprint, we decided to make it into a one day hackathon. We challenged the attendees to spend the day building apps against the OpenStack Swift API. At the end of the day, we were able to give a Nexus 7 to the attendees who made the best apps.
Throughout the day we had several people hacking on Swift apps and we even had some SwiftStack devs hacking on improvements in Swift’s core code. At the end of the day, we had three apps built.
John Hampton integrated a Windows backup service with Swift. His code is at https://github.com/pacopablo/anagogic-backup-swift.
Sunil Nayak wrote a similar app that offered automatic syncing of local data and data stored in a Swift cluster (similar to what Box or Dropbox would provide). His code is at https://github.com/suniln/SwiftBox.
Sergey Lupersolsky made a great start on a web dashboard to explore data stored in Swift based on the metadata stored in the container listings. His code is at https://github.com/slupers/swift_browser.
From left to right: Sergey, John, and Sunil.
Thanks to all who participated. Thanks to Red Hat and the OpenStack foundation for helping with the sponsorship and organization.
Software-defined storage is one of the industry’s newest buzzwords, so I thought I would take a moment to explain what the term means.
Why We Started SwiftStack
The genesis for starting SwiftStack in 2011 was the pain we experienced first hand from deploying, managing and using vendor-specific storage systems. As users, what we wanted was more flexibility, less lock-in, better control and lower costs than what traditional storage systems could provide. We also heard the same thing from other organizations - and with data growing dramatically (but not IT budgets), there was absolute certainty that the pain would only exacerbate over time for not just us, but for pretty much everyone who stored and served data at scale.
So how did we end up here? Most existing storage systems are single, integrated systems. Over the years, they have gotten easier to deploy and frankly, quite reliable. It has evolved in this manner because the problem domain is constrained, and also because these systems only support specific hardware and software combinations. The problem, however, is that these storage systems are confined to a vendor-defined operating system and vendor-provided hardware. This means that system capacity growth lags behind the general hardware market and you are locked out of decreasing hardware prices. A self-contained storage system also lacks flexibility; it cannot take advantage of new hardware capabilities and platforms, and expansion is limited due to head unit offerings by the vendor.
Enter Software Defined Storage
When we started SwiftStack, our big idea was to provide an object storage system - OpenStack Swift - with a de-coupled management system so customers could achieve (1) amazing flexibility on how - and where - they deployed their storage, (2) control of their data without being locked-in to a vendor and (3) private storage at public cloud prices. At the time, we didn’t call it software defined storage (frankly, no one did) but we think the term perfectly illustrates the fundamental change this model represents.
Software Defined storage (SDS), which decouples software from the hardware, has many benefits. Customers can instantly take advantage of the changing hardware landscape and add newer components to expand performance and capacity as needed. But the difference from I think how we think of SDS from others is that needs to be more than running storage software on industry standard hardware. For to be truly software-defined, we believe the control - and the data access - also needs to be de-coupled from the underlying storage hardware. Through this de-coupling, customers can now make choices on how their storage is scaled and managed and how users can store and access data - all driven programmatically for the entire storage tier, independently where the storage resources are deployed. In addition, this allows organizations to deploy a storage infrastructure that is not just API compatible with public cloud storage, but architecturally identical and at lower cost.
So what are some of the main characteristics of SDS?
Typically, IT operations teams have already built up a tremendous amount of tools around a particular hardware vendor and operating system for their computing infrastructure. This includes everything from procurement process, racking/stacking, OS provisioning, operational support systems, etc. Software-defined storage systems enable IT teams to utilize infrastructure components that are already in place - and apply that to their storage tier. In this sense, it is just an extension of what IT operations teams are already doing. By leveraging the existing OS and familiar hardware, this greatly reduces complexity.
SDS shifts reliability to the software
With reliability shifted to the software, any single component can fail but the data will remain available and durable. SDS allows a storage system to span to a tightly-integrated storage stack and out to multiple nodes and racks – even multiple data centers.
Decoupled controller enables SDS
SDS can be centrally managed with no limits on cluster size. A decoupled controller coordinates all the nodes that form a complete storage platform. This is done by orchestrating data placement and using a single pane of glass for management of roles and health of each node and cluster. Capacity can be added by adding nodes, and likewise, old equipment can be seamlessly decommissioned. Rolling upgrades for both hardware and software are seamless.
So Who Needs SDS?
With all the buzz going on about SDS now there is no doubt some level of confusion in the market. The organizations who would benefit most from an SDS solution have a growing data set, an increasing number of users (internal or external) and probably a flat IT budget. From our experience here at SwiftStack, a LOT of organizations fall into this category but the early adopters are typically web, SaaS and mobile application companies - and enterprise organizations moving to a storage as-a-service model for their applications and users. What all these users have in common, however, is that they - as we did before starting SwiftStack - is to get more flexibility, less lock-in, better control and lower costs than what traditional storage systems can provide. With SwiftStack’s SDS solution, they can achieve that.
One of the hard problems that needs to be solved in a distributed storage system is to figure out how to effectively place the data within the storage cluster. Swift has a “unique-as-possible” placement algorithm which ensures that the data is placed efficiently and with as much protection from hardware failure as possible.
Swift places data into distinct availability zones to ensure both high durability and high availability. An availability zone is a distinct set of physical hardware with unique failure mode isolation. In a large deployment, availability zones may be defined as unique facilities in a large data center campus. In a single-DC deployment, the availability zones may be unique rooms, separated by firewalls and powered with different utility providers. A multi-rack cluster may choose to define availability zones as a rack and everything behind a single top-of-rack switch. Swift allows a deployer to choose how to define availability zones based on the particular details of the available infrastructure.
When Swift was first released, deployers were required to have at least as many availability zones as replicas of their data. This data placement method did not work well for most deployments. Deployers were forced into convoluted deployment patterns that did not match their underlying hardware. Despite the actual details of the deployment, clusters were required to have at least three availability zones, and ideally four or five for handoff purposes. (When data cannot be immediately placed in one of its primary locations, Swift will choose a handoff node, if available, to ensure that data is fully replicated.) Often times this lack of flexibility in the system caused deployers to do odd things. For example, a small cluster on two servers would be required to carve out some drives from each server to serve as a third availability zone.
Something better was needed, and so a better method was created. Commit bb509dd8 last April updated Swift’s data placement method to use a “unique-as-possible” placement. With this new method, deployments are not required to force Swift’s semantics onto a deployment that doesn’t exactly match.
Swift’s unique-as-possible placement works like this: data is placed into tiers–first the availability zone, next the server, and finally the storage volume itself. Replicas of the data are placed so that each replica has as much separation as the deployment allows.
When Swift chooses how to place each replica, it first will choose an availability zone that hasn’t been used. If all availability zones have been chosen, the data will be placed on a unique server in the least used availability zone. Finally, if all servers in all availability zones have been used, then Swift will place replicas on unique drives on the servers.
As an example, suppose you are storing three replicas, and you have two availability zones, each with two servers.
In this example, you can see that there is at least one copy in each availability zone, but no two replicas are on the same server. If, for example, Server C became unavailable for some reason, new writes would use Server B as a handoff node (rather than reusing Server A or Server D), thus keeping a good separation of the data and protecting data durability.
The unique-as-possible placement in Swift gives deployers the flexibility to organize their infrastructure as they choose. Swift can be configured to take advantage of what has been deployed, without requiring that the deployer conform the hardware to the application running on that hardware.