The Asynchronous C++ Parallel Programming Model
With the advent of modern computer architectures characterized by — amongst other things —many-core nodes, deep and complex memory hierarchies, heterogeneous subsystems, and power-aware components, it is becoming increasingly difficult to achieve best possible application scalability and satisfactory parallel efficiency. The community is experimenting with new programming models which are based on finer-grain parallelism, and flexible and lightweight synchronization, combined with work-queue-based, message-driven computation. Implementations of such a model are often based on a framework managing lightweight tasks which allows to flexibly coordinate highly hierarchical parallel execution flows. The recently growing interest in the C++ programming language in industry and in the wider community increases the demand for libraries implementing those programming models for the language. Developers of applications targeting high-performance computing resources would like to see libraries which provide higher-level programming interfaces shielding them from the lower-level details and complexities of modern computer architectures. At the same time, those APIs have to expose all necessary customization points such that power users can still fine-tune their applications enabling them to control data placement and execution, if necessary. In this talk we present a new asynchronous C++ parallel programming model which is built around lightweight tasks and mechanisms to orchestrate massively parallel (and distributed) execution. This model uses the concept of (std) futures to make data dependencies explicit, employs explicit and implicit asynchrony to hide latencies and to improve utilization, and manages finer-grain parallelism with a work-stealing scheduling system enabling automatic load-balancing of tasks. As a result of combining those capabilities the programming model exposes auto-parallelization capabilities as emergent properties. We have implemented the this model as a C++ library exposing a higher-level parallelism API which is fully conforming to the existing C++11/14/17 standards and is aligned with the ongoing standardization work. This API and programming model has shown to enable writing parallel and distributed applications for heterogeneous resources with excellent performance and scaling characteristics.