Looking with Cassandra into the future of Atlas

jemfinch · on July 16, 2012

Database system combining two published technologies battle-tested by two of the largest Internet websites performs better under real world conditions than database written by amateurs, news at 11!

(In all seriousness, what did they expect? Have they looked at the MongoDB code? Do they seriously believe that the 10gen folks are smarter or better at solving problems than the masses of engineers Google and Amazon have thrown at this problem?)

jbellis · on July 16, 2012

I appreciate the sentiment, but it's worth distinguishing between the design and implementation here. Cassandra does take a lot of inspiration from the Bigtable and Dynamo papers, so we benefit from the thinking of a handful of very smart engineers at Google and Amazon, respectively, but the actual code is our [Apache's] own and for the battle testing you need to thank companies like Netflix, Reddit, Spotify, and others. [1]

That said, part of the reason Cassandra was attractive to me from the beginning is that unlike master-slave designs like MongoDB (or Bigtable/HBase, for that matter), a p2p design doesn't have the many corner cases around failover and recovery that complicate troubleshooting so badly. This is a primary reason Cassandra has had a very good reliability story since very early on.

[1] http://www.datastax.com/cassandrausers

jemfinch · on July 16, 2012

Sorry, I didn't mean to imply that Google or Amazon engineers implemented Cassandra (though it's notable, of course, that it was initially implemented by Facebook, adding a third Internet behemoth to its pedigree).

What I really meant to say is that it's clear that the engineers behind Cassandra have done their research and chosen an extremely well tested design, while the engineers behind MongoDB seem to be completely winging it, ignorant of the literature and writing (based on my last examination) extremely amateurish code.

est · on July 17, 2012

argumentum ad verecundiam

jemfinch · on July 17, 2012

You're confusing evidence with authority and shrouding it in Latin in an attempt to appear erudite.

est · on July 17, 2012

Your evidence is

> Amazon, Google, Facebook throw mass engineers at problems Cassandra has overlapped with.

> There for, Cassandra is better than "database written by amateurs"

If you have hard evidence post the fucking evidence.

jemfinch · on July 17, 2012

Stop selectively reading through your ideological lens.

My evidence is "the technology that Cassandra uses"--that is, the SSTable/BigTable storage format and the Amazon Dynamo eventual consistency and distribution model--has been not only peer reviewed and published, but has been serving for years as the foundation of two of the busiest Internet websites in the world. That's real world experience orders of magnitude more extensive and robust than the anything MongoDB can offer.

stevencorona · on July 16, 2012

Where Cassandra REALLY shines and is often overlooked is ease of maintenance. Cassandra's ability to bootstrap new nodes, replicate, reshard and handle down nodes (w/ hinted handoff) is almost magical. I use it in production and it works very reliably.

Sure, it's got some cool big data stuff, but try doing any of those "maintenance" operations on other databases without ripping your hair out. For example, even bringing up a new MySQL slave is a huge pain in the ass, let alone doing something non-trivial like promoting a new master.

zorked · on July 17, 2012

I agree completely. By the time you need something like Cassandra you have typically had so many problems with your previous database that you just assume that anything you move to will be ultra-complex, but Cassandra is so simple to keep running that you can almost forget that it's running at all.

Even the developers who were resistant due to the different data model have completely embraced it.

evanelias · on July 17, 2012

Sure, those things can be a pain if you're relatively new to MySQL administration. But luckily there's a huge wealth of automation options released by some of the world's largest web sites.

One of MySQL's greatest strengths is the maturity of its ecosystem. Tons of existing automation, a relatively large number of skilled developers you can hire, multiple dedicated conferences, good free online docs, dozens of books, several top-notch consulting firms for handling the crazy edge cases.

I'll readily admit that MySQL has its flaws, but personally I wouldn't rank difficulty of maintenance operations high on the list... at least from a practical perspective, and relative to other comparable data stores used at a massive scale.

In my experience, no matter what databases you choose, if you "go big" you can't ever escape the need for robust automation and a decent amount of in-house talent.

sbtourist · on July 17, 2012

Agreed, we had a pleasant operational experience, except for some not-so-well documented details around auto-bootstrapping and host names, as briefly mentioned in the blog post.

But cluster elasticity and resiliency has been working really good so far.

pestaa · on July 16, 2012

Interesting. I thought sysadmin operations like MySQL sharding and master-promotion are usually automated/scripted? Not to say it makes it conceptually easier, but it limits the opportunities to lose hair to 1.

jbellis · on July 16, 2012

If it's complex enough you need to script it, it's complex enough to make troubleshooting a bitch when something goes wrong in that script...

stock_toaster · on July 16, 2012

Really? I always heard it was a bit of a pain to manage (update a schema? reboot all nodes). Not sure if any of that has changed, or if it was ever true (never used it myself).

stevencorona · on July 16, 2012

This was only an issue in very early versions of Cassandra.

stock_toaster · on July 16, 2012

Ah. Good to know.

Thanks.

Vitaly · on July 17, 2012

did you try riak? I haven't myself yet, but from what I read it is also very easy to maintain. I'm on a crossroads right now, wandering which one of cassandra or risk to use to replace mongo in our still-in-development project. stability and devops is a very strong factor in the decision.

shirleman · on July 19, 2012

Happy to chat about this if you want to learn more about Cassandra. shirleman@datastax.com

monstrado · on July 17, 2012

Did you guys investigate any other choices such as DynamoDB, or HBase? I know that Facebook (inventors of Cassandra) have moved off of Cassandra over to HBase for its back-end messaging services due to its inherent consistency problems.

jbellis · on July 17, 2012

"Consistency problems" is a red herring; Cassandra can be easily asked to sacrifice availability for consistency [on a per-request basis], if that's what you want: http://www.datastax.com/docs/1.1/dml/data_consistency

Facebook never used modern Apache Cassandra, so when they tasked a team of experienced Haddop/HDFS engineers to build Messaging, it was natural for them to choose HBase. But the things they've had to do to deal with the problems in that architecture (e.g., sharding into "cells" to deal with namenode SPOF) [1] make me think that they would have done better with Cassandra.

[1] http://www.slideshare.net/brizzzdotcom/facebook-messages-hba...

dhruba · on July 17, 2012

If you have three replicas in casandra and you want your reads to be consistent, you would have to read from at least two replicas . This doubles the number of iops to the disk subsystem. Most disk based storage systems are bottlenecked by random iops, and a solution that requires double-the-iops is a non-starter.

3amOpsGuy · on July 17, 2012

I don't think that's strictly true.

With cassandra you choose where the extra reads (or writes) to ensure consistency occur as best suits your use case:

* At write time (write quorum) like a traditional replicated datastore

* At read time (read quorum)

* In the background (read one, with a non-zero chance of read repair)

This is quite nice to be able to bend Cassandra on a use case by use case basis (NB: these are not cluster wide approaches, for different columns / circumstances i can choose different patterns).

sbtourist · on July 17, 2012

We already have an HBase cluster in-house we use together with Hadoop for social analytics. By the way, we ruled it out as our operational experience with it was a little bit of a pain, in particular around version updates.

Talking about DynamoDB, it is certainly attractive as it scales, is managed by Amazon and all the yadda-yadda, but we didn't want to be lock-in to Amazon, and costs were another concern.

sbtourist · on July 17, 2012

And talking about consistency, as already pointed out by Jonathan, Cassandra actually provides a way to allow for strong consistency: also, with its tunable consistency levels, it provides greater freedom to implement different access patterns depending on the use case, as described on the blog post.

monstrado · on July 17, 2012

As someone has mentioned, Cassandra is able to be strongly consistent at the cost of a pretty significant performance cut. I am curious just how significant this performance cut would be to get consistency at the level of something like DynamoDB or HBase which is advertised as strongly consistent. Afterwards I'd be interested to see what the performance numbers would look like for all three.

I agree with you though, locking into Amazon is kind of a drawback :-(

sbtourist · on July 17, 2012

I disagree here.

First, in order to achieve strong consistency in a distributed system with replication enabled, you will always have to write to multiple replicas: provided the number of replicas is the same, whether you're using a consistent-hashing based replication (Cassandra) or a distributed file system (HBase) it really doesn't matter much, as I/O latencies (both network and disk related) will always play a large role here.

That said, you can have different performances if you:

1) Change the replication factor: this is something you can do regardless of the actual system you're using, so it doesn't count.

2) Change the number of "acknowledged" writes/reads as a fraction of the replication factor: strongly consistent systems will always provide at least as much acknowledges as Cassandra's QUORUM, so no big win here.

3) Change the point where you acknowledge writes, either memory, disk or physical sync: this is about durability, and if you use different settings for different systems (i.e. Mongo VS Cassandra) it is apples and oranges.

4) Change your physical storage structure (provided same level of durability as explained above): different databases use different structures, Cassandra uses Memtables+SSTables, which I don't expect to be slower than HBase HDFS-based ones.

So in the end, Cassandra is at least as fast as other distributed databases, even when working in strongly consistent fashion.

chaostheory · on July 16, 2012

I think in the near future Mongodb will no longer have a global write lock. Yeah but I feel their pain. Given my still limited experience with it, Mongodb may be great for reads, but not so much for writes.

achompas · on July 16, 2012

EDIT: deleted b/c I didn't make my point clearly.

ra · on July 17, 2012

It would be better for the conversation if you edit to clarify your point rather than delete...

taligent · on July 16, 2012

Call me stupid but I can imagine that losing data is a problem for a database company.

That said MongoDB has a simple flag which allows you to choose how to handle writes. The app can 'wait' for writes to be applied to one node, all nodes, some number of nodes etc.

achompas · on July 16, 2012

Heh, that probably wasn't phrased well. Let's try again:

Dynamo (and thus Cassandra) is designed for write speed and durability. What is MongoDB designed for?

chaostheory · on July 17, 2012

(assuming you're talking about safe writes opposed to normal writes) then you kill performance... which was one of the main reasons for switching to mongodb in the first place

taligent · on July 16, 2012

MongoDB doesn't have a global write lock.

It has a per collection lock and is moving towards per row locking.

nodesocket · on July 16, 2012

I don't think that is correct. Version 2.2 (beta) has database level locking, but that was the hard part. Collection, and document locking are coming after according to this talk by the CEO Dwight Merriman. (http://www.10gen.com/presentations/concurrency-internals-mon...)

einhverfr · on July 17, 2012

The problem of course with Cassandra is getting results you will believe out of it. Also I understand Cassandra to be a Trojan. Use at your own risk. When Atlas lets you down and AJAX has gone crazy (like slaughtering all your cows in a fit of drunken pique), and your city is in flames don't say I didn't warn you. But you won't believe me either ;-)

(ok, sorry for the obligatory Illiad joke there.)

hoodoof · on July 16, 2012

http://news.ycombinator.com/item?id=3984642