In order to create software that is appealing to the enterprise market today, Sigma1 must create software for five years from now. In this post I will answer the questions of why and how Sigma1 software intends to achieve this goal.
The goal of Sigma1 HAL0 software is to solve financial asset allocation problems quickly and efficiently. HALo is portfolio-optimization software that makes use of a variety of proprietary algorithms. HALo’s algorithms solve difficult portfolio problems quickly on a single-core computer, and much more rapidly with multi-core systems.
Savvy enterprise software buyers want to buy software that runs well on today’s hardware, but will also run on future generations of compute hardware. I cannot predict all the trends for future hardware advanced, but I can predict one: more cores. Cores per “socket” are increasing on a variety of architectures: Intel x86, AMD x86, ARM, Itanium, and IBM Power7 to name a few. Even if this trend slows, as some predict, the “many cores” concept is here to stay and progress.
Simply put — Big Iron applications like portfolio-optimization and portfolio-risk management and modelling are archaic and virtually DOA if they cannot benefit from multi-core compute solutions. This is why HAL0 is designed from day 1 to utilize multi-core (as well as multi-socket) computing hardware. Multiprocessing is not a bolt-on retrofit, but an intrinsic part of HAL0 portfolio-optimization software.
That’s the why, now the how. Google likes to use the phrase “map reduce” while others like the phase embarrassingly parallel. I like both terms because it can be embarrassing when a programmer discovers that the problems his software was slogging through in series were being solved in parallel by another programmer who mapped them to parallel sub-problems.
The “how” for HAL0’s core algorithm is multi-layered. Some of these layers are trade secrets, I can disclose one. Portfolio optimization involves creating an “efficient frontier” comprised of various portfolios along the frontier. Each of these portfolios can be farmed out in parallel to evaluate its risk and reward values. Depending on the parameters of a particular portfolio-optimization problem this first-order parallelism can provide roughly a 2-10x speedup — parallel, but not massively parallel.
HALo was developed under a paradigm I call CAP (congruent and parallel). Congruent in this context means that given the same starting configuration, HAL0 will always produce the same result. This is generally easy for single-threaded programs to accomplish, but often more difficult for programs running multiple threads on multiple cores. Maintaining congruence is extremely helpful in debugging parallel software, and is thus very important to Sigma1 software. [Coherent or Deterministic could be used in lieu of Congruent.]
As HAL0 development continued, I expanded the CAP acronym to CHIRP (Congruent, Heterogeneous, Intrinsically Recursively Parallel). Not only does CHIRP have a more open, happier connotation that CAP, it adds two additional tenets: heterogeneity and recursion.
Heterogeneity, in the context of CHIRP, means being able to run, in parallel, on a variety of machines will different computing capabilities. On on end of the spectrum, rather than requiring all machines in the cloud or compute queue having the exact same specs (CPU frequency, amount of RAM, etc), the machines can be different. On the other end of the spectrum, heterogeneity means running in parallel on multiple machines with different architectures (say x86 and ARM, or x86 and GPGPUs). This is not to say that HAL0 has complete heterogeneous support; it does not. HALo is, however, architected with modest support for heterogeneous solutions and extensibility for future enhancements.
The recursive part of CHIRP is very important. Recursively parallel means that the same code can be run (forked) to solve sub-problems in parallel, and those sub-problems can be divided into sub-sub problems, etc. This means that the same tuned, tight, and tested code can leveraged in a massively parallel fashion.
By far the most performance-enhancing piece of HAL0 portfolio-optimization CHIRP is RP. The RP optimizations are projected to produce speedups of 50 to 100X over single-threaded performance (in a compute environment with, for example, 20 servers with 10 cores each). Moreover, the RP parts of HAL0 only require moderate bandwidth and are tolerant of relatively high latency (say, 100 ms).
Bottom line: HAL0 portfolio-optimization software is designed to be scalable and massively parallel.