Gorilla Optimization
This is what tends to happen. If the first day over a cold coincides with a day off from work, I tend to get curious about things, explore, and end up blogging. I leave it to you weather that is a good or bad thing. Enough personal stuff…
I’ve been trying to optimize building summary information about about 12 million very-very-very tiny records. This is still technology under development, so please forgive my vagueness. The slowness of the query comes from a few places.
Problems
First, sheer record number. Twelve million of anything amplifies your decisions, and these records are very small. It all fits in memory and it’s nothing but CPU churn.
Second, there is a single join against a label table which, I partly blame Postgres for the slowness of. The labels are assigned by the closest prefix less than the given value. It’s not a clear join and as a result Postgres always scans the table’s index. I assume that Oracle would be able to put some snazzy prefix searching or Trie-type structure in place, but that has yet to be seen.
So, those are the two big problems. I am very proud, though, of what I’m going to term “gorilla optimizations” that our team came up with to shrink the problem down to a size where It’s clear we can continue to fly on Postgres for a while longer.
Solutions!
First, we decided to pull the join lookups out into an external library which was provided by the vendor. This alternate lookup method is faster even when having our application do all the lookups in Ruby code.
Second, we decided to summarize the labeling. Since the reports we are looking at generating are about maximums and upper-bounds it was acceptable to reduce precision on our label lookup by reducing the range of our foreign key to the label. This was done by simply truncating foreign keys and grouping them. The result is that almost all of them still receive the same value, but a few do not. This also resulted in far fewer calls to the label library.
Third, we summarized by a time slice. The database could give us hours very cleanly ( date_trunk('hours', ... ) ) but in situations where we wanted 1/2 days we had the application code examine the raw rows given back by Postgres and aggregate them into the size block desired before storing.
The results of all these optimizations sums up to something we can probably live with for quite a while, and why I share this is that no single optimization could have made this work. The constant picking away at bits of bloat and lag by many minds has resulted in not having to do any sort of major redesign or custom system development, and that’s a nice story worth sharing.
No Sam. Not “Gorilla Optimization.” It should be “Gorilla Optimus Prime.”
Dammit, html crapped out on me.
http://www.nunoxei.com/images/OPTIMUS_primal.jpg