I am working on some of the worst code I have seen in years. About the best thing I can say for it is, it's mercifully short - 800 lines of Java with eight comments. (The second best thing is that it isn't in VB. The last truly horrible code I worked on was a VB monster with a 4000-line loop and case statement at its heart.)
The main loop is 250 lines. I don't think the author understood recursion; there are four separate stacks that maintain state in the loop! I'm not sure yet but I think they will all be replaced by recursive calls.
The loop builds a 5 MB XML structure in a StringBuffer, then writes it out to a file! Um, why not just write it to a file directly? Well, the StringBuffers are on one of the stacks so I have to sort that out first.
It has wonderfully readable conditional code like this:
if (isNoLocale() == false) ...
And the clincher - the program uses 16,285 localizations. They are keyed by language code and id. So how do you think the program was accessing this data? It put it all in a List and searched sequentially for a match!!! Yikes!
When I first ran the program it took 191 seconds to create the file.
So today's wins were to
- Refactor the CSV reader part of the code into a separate class and clean that up, including changing the line items from Maps to Lists. Time to generate the file: 119 sec.
- Build a two-level Map so the localizations can be looked up directly instead of by exhaustive search. Time to generate the file: 19 seconds!
And what does this have to do with anything, anyway? Well, I have to rant every once in a while or I will have to rename my blog :-)
And what is the way out of this mess? Refactoring and unit testing, of course! I have three test files containing all the live data used by the program. Every time I make a change I regenerate them and test with XmlUnit. Now I can refactor without fear to get the program to a point where I can understand it.
Poor code structure is a performance issue! Because if you can't understand it, you can't find the bottlenecks. You can't even profile code that is all in one method.
I saw the same thing with the 4000-line VB loop. I started factoring out common code and eventually I could see where the performance problems were and do something about them.
|