Cassandra at Go Daddy: Distributed Session Store

This is the first in a planned series of articles about our use of Cassandra at Go Daddy. We use Cassandra in a number of production applications and it is slated for further development in the near future. In this article I describe some of the perspectives and lessons learned deploying our first application: a Web session store.

Context

This deployment was part of an effort to implement a non-sticky session store. Typical out-of-the-box Web session storage is local to the Web server, meaning subsequent requests from a given client must return to the same server. Thus, the session is said to be ‘sticky’ because the client is stuck with a single server. Decoupling the session from a server is commonly accomplished by moving session storage to a second-tier or clustered resource. In this configuration the session is said to be ‘non-sticky’ because a client session may be loaded from the clustered resource by any of the front-end Web servers. Among other things, non-sticky sessions enable maximum use of distributed Web acceleration caching products for non-static content.

Before my arrival at Go Daddy, Cassandra had been identified as a target for a distributed session store after a third-party commercial solution failed to meet expectations. Cassandra was a viable candidate because of its widely heralded features: fault tolerance, tunable consistency, scalability, and durability. I was fortunate to be involved with deploying our first clusters and integrating them for session storage for a number of our customer-facing Web sites.

Back in the Day…

We started integrating with Cassandra at version 0.4.2, which was somewhat early days for Cassandra — before it graduated from the Apache Incubator. We kept abreast of new releases during development and first went into production on a 0.5.x version, eventually upgrading and stabilizing on 0.6.x in front of the breaking Thrift change in 0.7.x. Now, that version sounds like ancient history to current Cassandra users, but we found it reliable. We do, however, have an upgrade to 1.0x planned for the near future.

Even then, it was already fairly easy to get up and running with a Cassandra cluster. But, there were still plenty of lessons to be learned in terms of integrating and operationalizing. For example, it was easy to misconfigure a node so that it would die after a time (for example due to inappropriate threshold settings), but considerably difficult to understand the symptoms.

At the time, the Cassandra wiki was underway, but did not have the same depth of content and clarity as it does now. Not to mention, the community was yet-to-be blessed with the raft of documentation now provided by Datastax. Thankfully, there was already a vibrant community surrounding the platform and it was usually possible to find answers or get set straight on the user mailing list.

Enough reminiscing! Now, some notes on the application…

Application Technical Notes

Architecture Overview
The session store system is implemented by two dedicated Cassandra clusters, a custom session client integrated into the front-end Web application, and cleanup maintenance services running on other middle tier infrastructure.

The figure below shows an overview of the architecture.

In practice, this pattern is repeated in multiple environments serving different applications. The components are discussed further below.

Client Integration
In order to integrate Cassandra as a session store, it would be necessary to implement a Session-State Store Provider for the target platform. We originally targeted two Web application engines: ASP.NET and Classic ASP.

For ASP.NET, integrating a new session store was as simple as specializing the SessionStateStoreProviderBase class and calling the Cassandra Thrift API in the required methods. The original design did not implement session locking.

The Classic ASP implementation was a bit more challenging since the Thrift trunk does not compile natively on Windows. Since we couldn’t impose the performance impact of COM-interop to use the .NET implementation, we instead based our solution on a patch graciously published by Rush Manbert on the Thrift JIRA. This patch modifies the Thrift underpinnings to use boost::asio, a cross-platform library. We additionally patched it to handle connect and I/O timeout configuration. This implementation reached end-of-life as all remaining sites were migrated away from Classic ASP.

Since the Cassandra Thrift RPC proxy is automatically generated, perhaps the most interesting part of the session client was cluster configuration and connection management. At the time we started, there was not the wide array of third-party client libraries there is now. Thus, we elected to implement a custom library providing the following features:

  • Connection Pooling – Thread-safe connection pooling to support concurrency in the Wb framework (with hard limits to avoid overuse and soft limits to return resources after a spike).
  • Compression – Session data compression can be engaged based on a configurable threshold.
  • Configuration Management – Cluster status and node membership, as well as client settings (such as timeouts and thresholds), are loaded dynamically from a centralized resource. Settings are defaulted and cached locally to avoid direct dependence.
  • Dynamic Node Fail-out – If connections or I/O operations to a specific node continually fail, that node is automatically removed from rotation in the pool temporarily.
  • Performance instrumentation – Clients may be configured to record statistics on connection status, pool levels, and Cassandra I/O operations. Stats may be gathered and monitored over time to monitor client-side health and performance over time.

Cluster Configuration
Typical applications deploy Cassandra in a single unified cluster since its scale is essentially linear with nodes, and it is usually handy to have all data in a single datastore. However, recognizing that we were integrating early with such a fast-moving project as Cassandra, we decided to deploy multiple peer clusters which could be switched in and out of rotation in a classic A/B, or C ‘all in’ configuration. This would allow us to upgrade cluster software through breaking changes by decommissioning and upgrading one, then the other.

The session client cluster configuration encodes the status of each cluster: ‘up,’ ‘draining,’ or ‘down.’ Live sessions are attached to a specific ‘up’ cluster. To drain a cluster does not imply that all data currently held by it will be copied to another. During a drain, live sessions read from a ‘draining’ cluster are written to another ‘up’ cluster. Much of the other data is allowed to pass from expiration and need not be transferred.

Data Model
The data model for the session store is decidedly vanilla compared to some other Cassandra use cases. We specify a single keyspace with ReplicationFactor of three. Writes and reads are done with ConsistencyLevel QUORUM to minimize latency while retaining a consistent view.

Two column families are defined. The most obvious one for this application is ‘session,’ which simply contains the session data keyed by its session ID:

session: {
   <Session ID> : {
      data: [Session data BLOB]
   }
}

The Session ID is an implementation-specific custom session key provider to ensure that keys are sufficiently unique so they will not collide. To prevent expired sessions from being returned, the session client tests the timestamp on the data column at the time of read.

A second column family was introduced to track session expiration times to allow removal of expired sessions:

expiration: {
   hash(<Session ID>) : {
      <Expire Timestamp><Session A>:
      <Expire Timestamp + n><Session B>:
   }	
}

The ‘expiration’ dynamic column family is used as a time-ordered index to expiring sessions in the datastore. A low priority ‘Reaper Service’ crawls this index and removes sessions as they expire.

Expiration entries are written to this index by hashing the session ID into a smaller range of numeric keys. This allows the index to be somewhat distributed (by nature of the random keyspace partitioner), but still crawled with the known keys. The dynamic columns in each row are time-ordered by concatenating the expiration time with the session ID. The index is queried using multiget_slice to obtain all expiring session IDs, which are then used as keys to remove data from the ‘session’ keyspace.

The expiring column feature introduced in Cassandra 0.7 would obviate the need for this column family altogether, allowing us to specify a TTL for the ‘data’ column in the ‘session’ column family. However, as we designed this application, that was not yet available.

Conclusion

Overall, we found Cassandra to be a reliable, effective, and ultimately a usable datastore. As I wrote this post, it was interesting to reflect back on the state of Cassandra then, and how far it has come in the short time since. At the time, we were designing around deficiencies that would soon be addressed elegantly by the datastore. We were also making architectural decisions to accommodate rapid upgrade cycles as the platform matured and stabilized. In the time since we started, the community has exploded and so too has the feature-set provided by Cassandra. Finally, the resources: more online material comes naturally with expanding use of the product, but I can’t say enough about the quality content being created at Datastax. The well-organized, version-specific, Cassandra documentation and complimentary technical blog articles have been instrumental for me in keeping abreast of the technology.

The next article in this series will depict our more recent deployment of Cassandra 1.0 for transaction logging and reporting in our distributed object store service.

Adam joined Go Daddy in 2009 as part of the eCommerce team and subsequently took a position with the Research and Development group where he is primarily focused on distributed data stores. Connect with Adam on Google+

9 Comments on "Cassandra at Go Daddy: Distributed Session Store"

  1. Kirit Parmar says:

    If you are using tomcat as application server, memcache is the best option to get the same result : Not sticky session. There is open source library available that can be easily integrate with tomcat. All the advantage you have mention (Scalability, fault tolerances etc) are very well handled by that too.

  2. Thanks for mentioning that, Kirit. For this particular application our front end web server platform is Windows/ASP.NET.
    I am aware of the memcached session store solution. One benefit I believe the Cassandra data store provides over a memcached cluster in this case is intrinsic data replication. This allows us to sustain multiple concurrent node failures without loss of session availability. The business values session state highly because it is tied to providing a consistent customer experience.

  3. How did you achieve strong consistency with Cassandra but still keep it performant? Was your client aware of the server topology so that you did not have to have read/write replication values that added up to greater than the number of nodes in the cluster, e.g. quorum reads and quorum writes.

    Also, was your hardware specifically tuned for Cassandra, e.g. using SSDs for storage?

  4. Thanks for sharing your implementation. I’ve got a few performance realted questions: What kind of volume is your cluster capable of supporting? What is your read/write ratio? How many nodes are you using?

    Thanks =)

  5. Also, what total data size are you aiming to support? Is there a specific reason why you didn’t go with an in-memory datastore like VoltDB, Gigaspaces, Oracle Coherence, etc.?

  6. @Martin, The clients are not partition-aware. We do reads and writes at QUORUM to achieve a consistent view (there has been a correction in the article since your inquiry was posted). To be clear, reads plus writes must be greater than the replication factor (not the number of nodes in the cluster).

    The cluster supporting this app uses traditional 15K SAS drives.

  7. @Ari
    I should note that this article was written as a historical perspective. As with many things in this field assumptions change rapidly.

    Regarding your capacity questions:
    I regret to say I don’t have easy access to current cluster performance capabilities. I may see if someone closer to the matter can provide current numbers.

    I can say our early testing demonstrated transactional rates (read+write at QUORUM) in excess of 5000/s on an unoptimized four-node cluster (note that this applies to our integrated stack and is much less than current raw Cassandra benchmarks). In this lab configuration we actually saturated the front end web servers before we reached the edge of the Cassandra cluster capability. In production there are far more front-end servers than Cassandra nodes.

    The read/write ratio is dependent on the site but an average of 65/35 was used for testing.

    There are multiple clusters in production presently, supporting different sites in varying capacities. The largest is eight nodes.

    Data is transient and total size related to how many sessions are active over a given window.

    And The question about alternatives:
    I was not actually part of the trade study that selected Cassandra for this project. I can tell you that some contributing factors were built-in replication, durability, and no-cost. I don’t know a lot about the alternatives you listed but it appears that GigaSpaces and Oracle Coherence are commercial software products. VoltDB Community also looks interesting but it was not available at the time this effort began.

  8. @Adam

    Thanks for sharing those stats. We use a similar 6 node setup and chose it for the same reasons: replication, durability, multi-master with an easy TTL setup. What surprised us is that even with a read+write>total nodes we would still see latency issues at p95. The balance of read/writes at our scale (we get >110,000 requests per second across the entire estate but obviously not all these end up hitting cassandra) seems to have an effect. Shoot me an email if you want to talk about it :-)

  9. mohammad says:

    in the overview of the architecture image you have add a repair service to remove all the expired sessions, Cassandra doesn’t handle this issue??

Got something to say? Go for it!

 
Traffic Log Image