Richard Bucker

Reported my First Bug to MongoDB

Posted at — Jun 17, 2011

I have a client that generates several million Asterisk CDR (call data records). These CDRs are not perfect. In fact they are formatted as TSV and not CSVs; and they have a leading TAB character. Since the CDRs are generated in 5 minute intervals and the files contain a few thousand CDRs it does not make sense to load the DB a record at a time. It actually makes more sense to bulk load so that the data is processed at as low a level in the DB engins as possible.My first attempt to load data into MongoDB failed. The data was all askew. The problem is/was that there was a leading tab in the TSV file. And during the normal processing of the input file the import utility was stripping all leading whitespace regardless of the filetype. Since the whitespace includes the TAB character and since the first column of my data was mostly empty… the file had a leading TAB character.And this character was considered a whitespace and so it was deleted before the record was processed.So I did what any open source guy would do. I opened a ticket. Fixed the bug. And presented my patch in the ticket.I hope they will accept it.