Richard Bucker

NoSQL for your next project?

Posted at — Jun 23, 2011

I keep keep going back and forth on the whole idea of NoSQL and it bothers me to no end. On the one side the idea of sharding the data at the server level is appealing. Then there are the Key/Value databases and then the document versions. They are graph databases, object databases and a few in between. But then, as always, there is this reality check.As wonderful as NoSQL would appear to be there is no single use-case that would seem to make it/them the obvious choice. And this is no more obvious as I stumble around a new project I’m considering… an open source merchant gateway and an open source issuing system. I would really like to have one and only one system that I could use for everything but that does not seem possible.For example; in order to deal with the protocol impedance I need a fifo queue of some kind. I like redis for this as it also has a TTL so that old data is simply removed from the queue. It also has the notion of fields in the value so that a single record can actually represent a dictionary or an array; making a useful container for the message components. (converting an ISO-8583 or variables from a POST into a dictionary is fun and useful). Also, since there is a fifo queue the transaction results and logging can be pushed into a different channel for aggregation and processing later.In both the issuing and gateway systems the transactional part is considered OLTP system and OLTP systems benefit greatly from hash tables like those that redis provides. However, redis is useless when it comes to reporting, mapreduce, partitioning, and performance when the dataset approaches available memory limits. So it would seem that it would be useful to have a SQL or other storage mechanism to store all the persistant data, a queue for all of the logs and transactions, and a hash to act as a cache for the data from the primary store. The challenge here is that the first time there is a miss on the account data on the cache the compute node will have to go to storage to get the account data. This can be costly and 2x machines will negatively effect the sigma score. And then system recovery is going to be harder even if the data is replicated around the systems. (expire/TTL does not work as expected on replicated systems.)The NoSQL Databases website claims that there are 122+ NoSQL databases. I just ran over a bunch of them and they all left me wanting something better… or at least feeling that I needed to go back to an RDBMS. (read PostgreSQL)Some requirements: a queue for message logging that is persistant and replicated a queue for transaction logging account info back to the primary storage a queue for transaction requests that can and will expire, however, the expiration will trigger a log entry a cache that represents the account data from the primary storage a cache that represents the config data from the primary storage that is reloaded on-demand flush from cache when the transaction entry is recovered from the transaction log evaluate the mapreduce hit/miss when the transaction log is processed or mapreduce once a day or hour import the data from the cache into a detached slice until the data is ready for consumption. Then attach it. (Postgres)One last comment. ┬áReplication is a pain. Specially when you are exporting transactions this way. Given the amount of time it takes to sync the data, specially with transaction bursts, the overall system can and will experience slowdowns. It gets worse when the users are trying to interact with the data. On top of that importing CSV is so much better. And finally, doing the imports when the users are not using the system means that the data is stale but accurate.I’m going to build a test harness made from redis+mongodb and redis+postgres… just for fun and testing.