Richard Bucker

N-way file merge Perl, Python, Go and Lua [Java, Ruby]- Compared

Posted at — Jun 21, 2012

[Update 2012-06-22] Here is the java version of this assignment. It is/was awful. Once you stray from OO, in java, the code inflates like a sea monkey and clearly OO is over the top overkill.[Update 2012-06-22] Here is the Ruby version of this assignment. I like it’s compactness although that came at a steep price as accessing hash elements meant clunky dereferencing and string comparisons were just awful. [That was an error on my part; works as you’d expect]<img class=“aligncenter size-full wp-image-1269” title=“merge” src=“" alt=” width=“645” height=“201” />Not to beat a dead horse but I now have 4 example implementations in Perl, Python, Go, and Lua.I did my complaining about Lua in a previous article, however, in summary here… this example in Lua is verbose and lacks consistency. I’m not expecting to reduce this to a single LOC (line of code) but I would have liked some additional APIs that would have implemented more efficient algorithms based on internals knowledge. Or at least well documented idioms.The Go example was fun because the version 1.x of the toolset was simple to use. I would regularly execute “go run merge_tick_data_hash.go file1.csv file2.csv” and it would run like a champ. The only challenge is/was that simple errors that most dynamic languages permit until the code would actually execute would cause the compiler to barf. And initially I had no idea they were compiler errors; but it was easy enough to get used too. The compiled version of the code was lightening fast to startup and execute even though it was 1.4M bytes in size.The Perl version took some doing. I was able to reduce the LOC and optimize the code quite a bit. I think there is still some room for improvement based on the Python implementation which was just a few lines smaller because it had the benefit of being written last. In this case I sacrificed adding the filename to the %ticks hash and that reduced a few LOC but added some de-referencing which “might” be optimized by a good JIT; cannot say for certain.I’d like to compare these implementations to a Ruby version but I’m just not a fan of RVM this week. As for the remaining candidates. I have to admit that I really liked the Go version, however, I do have a complaint that while “Go” seemed like a good name for the project when it started. (prefix for google) right now it’s hard to do google searches. GO is such a small and common word that there is no way to optimize searches. One strong advantage is the static linking once the project is compiled.