Richard Bucker

Loading CDRs into MongoDB

Posted at — Jun 17, 2011

Sweet. This was as slick as you’d expect.The task was to load 235529 records from 100+ CDR files into MongoDB using the mongoimport tool. Using a Rackspace server with 512M ram and 20GB disk… but it’s all virtual anyway.Here are the numbers (not scientific at all): 1m 10s - with verbose turned on 34s - with verbose turned offI’m certain that some portion of the latency with verbose on is that the console was remote and so there was some lag in the i/o across the internet.The import:$ . ./bin/cdrmongoimport.shconnected to: 127.0.0.1dropping: data.cadb 30700 10233/second 57500 9583/second 85000 9444/second 113600 9466/second 144000 9600/second 170700 9483/second 197200 9390/second 223600 9316/secondimported 235529 objectsJust to be sure I checked that all of the data was loaded… some people have been complaining that data has been lost.$ wc -l /tmp/20110515/*. . .(snip). . .  235529 totalAnd then I checked the count on mongo.$ ./mongo/mongoMongoDB shell version: 1.9.0connecting to: test> use dataswitched to db data> db.cadb.find().count();235529byeSo everything is exactly where it needs to be in terms of performance. With any luck the loading is going to be linear. So that if I loaded 20M records I could expect to take about 40 minutes.What is interesting here… is that 40 minutes of loads all at once would normally cause a SQL/RDBMS to burp as the locks were escalated and as indexes needed rebalancing etc. This is one of the main reasons why DBAs prefer to load the initial data from bulk loads into temp tables before moving them into their final resting place. Any why Postgres supports sharded tables that can be temporarily detached while the import takes place.[update]I decided to try loading a similar range of files remotely over the WAN. It got off to a slow start but then it got to about 75% of the performance that “on the same box” did… and this was through an encrypted tunnel.rbucker@klub:~$ . ./bin/cdrmongoimport.sh connected to: 127.0.0.1dropping: data.cadbdashboard@cadb.bigbllc.com’s password: 100 33/second 23100 3850/second 48700 5411/second 80300 6691/second 106500 7100/second 131700 7316/second 158200 7533/second 183200 7633/secondimported 187600 objects