Cassandra Notes from Bits and Bytes
Dominic Williams has an excellent post at Bits and Bytes on why he (his company, really) moved to Cassandra. This post is a simple digesting of the salient points he makes that speak to me (and a few points I add as commentary). For any who might care, because of the heavy analytical work I’m involved in, we’ll be sticking with Hadoop and perhaps HBase for the time being, but it’s good to keep an eye on this technology. I’ve actually got a few projects that are queued up in my mind which Cassandra would be a great fit for… at least… I think it will be… until I get a chance to play with it!
- Blood Line
- Gross generalization of breed strengths.
- Who made this technology and why?
- Brutally pragmatic.
- HDFS is for analyitics.
- Cassandra is for distributed access.
- Code culture – what can you expect from the community.
- Generally, who is still moving forwards and at what rate?
- Consider that HBase is older.
- Consider that Cassandra is yonger, but yet quite mature. Bit momentum there.
- Consider the raw count of developers in #cassandra and #hbase on freenode.org’s IRC server at any given time.
- CAP – CA vs. AP
- CAP – Consistency, Availability, Partitioning.
- False dichotomy between CA and AP.
- Cassandra lets you choose.
- HBase focuses intentionally on CA. Region Node is a failure point.
- Specifically, notice the read and write levels mentioned in the Cassandra wiki.
- HBase requires setting up Hadoop and Zookeeper. Added management thrash, but added systems which you may be using anyway.
- Cassandra seems to have a performance edge for single record references.
- MapReduce will be available for Cassandra in v. 0.6. Yay!
So, there you go. Not an earth shattering post, but an exercise that’s helped me. Hope you enjoyed it!