The first phase of developing the HALO (Heuristic Algorithm Optimizer) Portfolio Optimizer was testing mathematical and heuristic concepts. The second phase was teaming up with beta partners in the financial industry to exchange optimization work for feedback on the optimizer features and results.
For the first phase, my primary tool for software development was the Ruby language. Because Ruby is a “high-level” extensible language I was able to quickly prototype and test many diverse and complex concepts. This software development process is sometimes referred to as software prototyping.
For the second, beta phase of software development I kept most of the software in Ruby, but began re-implementing selected portions of the code in C/C++. The goal was to keep the high-change-rate code in Ruby, while coding the more stable portions in C/C++ for run-time improvement. While a good idea in theory, it turned out that my ability to foresee beta-partner changes was mixed at best. While many changes hit the the Ruby code, and were easily implemented, a significant fraction hit deep into the C/C++ code, requiring significant development and debugging effort. In some cases, the C/C++ effort was so high, I switched back portions of the code to Ruby for rapid development and ease of debugging.
Now that the limited-beta period is nearly complete, software development has entered a third phase: run-time-performance optimization. This process involves converting the vast majority of Ruby code to C. Notice, I specifically say C, not C/C++. In phase 2, I was surprised at the vast increase in executable code size with C++ (and STL and Boost). As an experiment I pruned test sections of code down to pure C and saw the binary (and in-memory) machine code size decrease by 10X and more.
By carefully coding in pure C, smaller binaries were produced, allowing more of the key code to reside in the L1 and L2 caches. Moreover, because C allows very precise control over memory allocation, reallocation, and de-allocation, I was able to more-or-less ensure than key data resided primarily in the L1 and/or L2 caches as well. When both data and instructions live close to the CPU in cache memory, performance skyrockets.
HALO code is very modular, meaning that it is carefully partitioned into independent functional pieces. It is very difficult, and not worth the effort, to convert part of a module from Ruby to C — it is more of an all-or-nothing process. So when I finished converting another entire module to C today, I was eager to see the result. I was blown away. The speed-up was 188X. That’s right, almost 200 times faster.
A purely C implementation has its advantages. C is extremely close to the hardware without being tied directly to any particular hardware implementation. This enables C code (with the help of a good compiler) to benefit from specific hardware advantages on any particular platform. Pure C code, if written carefully, is also very portable — meaning it can be ported to a variety of different OS and hardware platforms with relative ease.
A pure C implementation has disadvantages. Some include susceptibility to pointer errors, buffer-overflow errors, and memory leaks as a few examples. Many of these drawbacks can be mitigated by software regression testing, particularly to a “golden” reference spec coded in a different software language. In the case of HALO Portfolio-Optimization Software, the golden reference spec is the Ruby implementation. Furthermore unit testing can be combined with regression testing to provide even better software test coverage and “bug” isolation. The latest 188X speedup was tested against a Ruby unit test regression suite and proven to be identical (within five or more significant digits of precision) to the Ruby implementation. Since the Ruby and C implementations were coded months apart, in different software languages, it is very unlikely that the same software “bug” was independently implemented in each. Thus the C helps validate the “golden” Ruby spec, and vice versa.
I have written before about how faster software is greener software. At the time HALO was primarily a Ruby implementation, and I expected about a 10X speed up for converting from Ruby to C/C++. Now I am increasingly confident that an overall 100X speedup for an all C implementation is quite achievable. For the SaaS (software as a service) implementation, I plan to continue to use Ruby (and possibly some PHP and/or Python) for the web-interface code. However, I am hopeful I can create a pure C implementation of the entire number-crunch software stack. The current plan is to use the right tool for the right job: C for pure speed, Ruby for prototyping and as a golden regression reference, and Ruby/PHP/Python/etc for their web-integration capabilities.