Without IPC (Inter-process communication) scalability is virtually impossible. With IPC comes numerous choices and tradeoffs: pipes, named pipes, messages, semaphore-managed file-sharing, memory-sharing, sockets, socket pairs… to name some. The tradeoffs involve platform support, job granularity, and performance.
After much thought, I have temporarily committed to Linux/UNIX named pipes as the primary IPC mechanism. The pros of this choice include robustness, reasonable speed, and a wide range of support for multi-thread, multi-core, and multi-server parallelism. The primary downside: lack of Windows compatibility.
The HAL0 portfolio-optimization suite can still run on Windows, but for now Windows parallelism is limited to thread-level scalability on a single machine.
Linux and UNIX programs often communicate very well together with streaming data. [For this post, I’ll used the term Linux to generally refer to Linux and/or UNIX]. Streaming data has its limitations, however. Perhaps the biggest limitation is “large-grain granularity”. By this I mean that an interaction involves loading the program (high-latency), processing the stream, closing the program. Streaming is expensive for fine-grain ad hoc requests because of the open/close overhead.
Named pipes (especially when properly paired) are an elegant way around the open/close overhead issue.
It is often a simple task to modify a Linux program to support named pipes. It does not matter if said program is written in C++, Python, Java, Ruby, Perl, shell, etc. If a program can be modified to accept input from a named pipe and write results to another named pipe, it can be integrated with other program that supports a common name, pipe-based API. HAL0 financial software does just that.
HAL0 software forms the core engine or kernel of the Sigma1 portfolio software. In parallel mode, HAL0 spawns worker jobs that compute portfolio metrics. HAL0 gathers, organizes, and evaluates the results. Then, based on the past and current wave of results, HAL0 identifies the most fruitful next wave of results to explore and “farms out” the next wave of computing. This is the primary means of achieving scalability: the worker jobs are distributed to other cores and other other servers. These “other” compute resources perform the massive heavy-lifting in parallel, speeding up computation immensely.
When there are enough worker jobs, there comes a point when the worker jobs cease to be the primary compute-speed limiter. This is where Amdahl’s law really kicks in. At some point maximum speedup is achieved, limited by the “core” processes ability to send, wait for, receive and process worker-job data.
If the “core” (or master or “boss”) process itself can be split into parallel processes, an additional level of scalability kicks in. This core HAL0 algorithm is designed to do just that.
Based on preliminary estimates, it can scale efficiently to 4 cores, delivering up to a 3.5X core speedup. Additionally the current HAL0 periphery for each HAL0 instance scales efficiently to up to 10-ways, providing about a 6X additional speedup per instance (depending on IPC latency). Ideally, Sigma1 portfolio-optimization software can provide up to a 21X speedup (3.5 times 6) operating in parallel mode versus single-CPU mode.
There are many caveats in the preceding paragraph. Right now I am focused on implementing and testing scalability, less so than on optimizing it. For example I am currently implementing single-kernel instance scalability in a manner than is deterministic, repeatable, and producing results identical to single-CPU operation. This limits the speedup, but makes regression testing practical. Regression testing in turn helps keep the code robust and reliable.
Portfolio-Optimization Software for the Enterprise
So far, ensuring that Sigma1 portfolio software is capable of massive scalability has roughly tripled the software development effort. This is obviously slowing time-to-market, but I continue to believe it is worth the effort and schedule impact. First, scalability is a key product differentiator for enterprise-level customers. Second, supporting scalability from day 1 is much easier and more reliable that trying to retrofit scalability into an intrenched software architecture.