Chip multiprocessors containing hundreds or even thousands of cores will challenge current operating systems (OS) practices. Many of the fundamental assumptions that underlie current OS technology are based on design assumptions that are no longer valid for a chip multiprocessor (CMP) containing thousands of cores. In the context of handheld devices, the OS must manage quality-of-service and resource contention for a complex multi-programmed environment. In the context of high performance computing (HPC) system requirements, as machines grow in scale and complexity, techniques to make the most effective use of network, memory, and processor resources will also become increasingly important. The interaction of the OS with the application is closely intertwined with the execution model for parallel applications where parallel tasks interact.
The FastOS Project
OS's must be refactored (deconstructed) to offer more flexible resource management and runtime support for parallel execution models with the focus on exposing system resource usage policies to the various levels of the programming stack. The overall goal of a deconstructed OS would be to allow the application to compose the best resource usage policies for its particular needs and to adapt to system scale and load. Policy control should be hierarchical, with different levels of abstraction depending on their consumer. For example, a hypothetical communication scheduling mechanism exposes to the libraries/compilers explicit control over message sizes and ordering, while exposing to the application level/programmer only abstract policies like “long routes first.” Adaptation can take form of quality-of-service mechanisms or migration for communication locality.
Previous experience with compiler and runtime optimizations for partitioned global address space (PGAS) languages indicates that lightweight control over OS mechanisms is not sufficient for good performance, and additional control over the policies that guide the management of these mechanisms is required. Looking beyond PGAS languages, even more control of resource management will be necessary to support the kind of novel execution strategies required for exascale applications. Current OS designs favor generic policies, e.g., preemptive thread scheduling or least recently used page replacement, which have been selected as being the least common denominator for commercial workloads. The execution models required for structured parallel algorithms in the scientific computing applications are less diverse, usually require cooperation between computing entities and exhibit a relatively ordered execution imposed by data/task dependence. When developing performance-critical parallel applications, developers are willing to trade time to market for more effort to fine-tune the performance of their application. Compilers for parallel languages are also well suited to take advantage of such functionality. The challenge is, of course, to offer some level of control over resource usage policies while still meeting the productivity goals of the development environment.
Future hardware technology is also envisioned to be highly memory constrained. Therefore, rather than a full OS model, there may be substantial benefit from an Exokernel OS model, where applications only load the minimum necessary OS functions in users-space. Our approach adds additional protection by having the applications run in their own partition, set up by a lightweight spatial partition manager to define the protection, resource sharing, and management of quality-of-service guarantees. The partition manager handles space and time partitioning of hardware resources, including physical processors, physical memory, and memory and interconnect bandwidths, while the policy layer above the application container will have complete control over scheduling and virtualization, if any. For example, in a single processor/multiple data execution layer used in Unified Parallel C (UPC), there is no need for processor virtualization, while for dynamic threading used in the emerging parallel languages, a lightweight user-space thread scheduler that can be directly controlled by the application or runtime would be beneficial.
The Tessellation Project
Tessellation is a new OS partition manager developed in cooperation with the UC Berkeley Par Lab. Unlike FastOS, the goal of the Tessellation project is geared more towards general purpose client computing. Highly parallel manycore systems will soon be the mainstream, not just in large machine room servers but also in small client devices, such as laptops, tablets, and handhelds. This emergence of ubiquitous multiprocessing presents both a challenge and an opportunity. On the one hand, multiprocessing has achieved only limited success for general computing; the challenge is to find innovative and compelling ways in which to use an ever increasing number of CPUs. On the other hand, the presence of vast CPU resources presents an opportunity to fundamentally change the assumptions and structure of systems software.
Unlike servers, which exploit parallelism across independent transactions from multiple users, single-user clients will require parallelized applications to benefit from a manycore platform. Future client devices will run a mix of interactive, real-time, and batch applications simultaneously; a user may run multiple web applications, such as Gmail and Facebook, while listening to MP3 music files and video chatting with friends. In addition, battery life is a critical issue for client devices, requiring energy to be a first-class resource that is actively managed by the operating system.
We argue that space-time partitioning (STP) is crucial for manycore client operating systems. A spatial partition is an isolated unit containing a subset of physical machine resources such as cores, cache, memory, guaranteed fractions of memory or network bandwidth, and energy budget. Space-time partitioning virtualizes spatial partitions by time-multiplexing whole partitions onto available hardware, but at a coarse-enough granularity to allow efficient user-level scheduling within a partition.
Space-time partitioning leads to a restructuring of systems services as a set of interacting distributed components. We propose a new “exploded OS” called Tessellation, structured around space-time partitioning and two-level scheduling between the global and partition runtimes. Tessellation implements scheduling and resource management at the partition granularity. Applications and OS services run within their own partitions and have exclusive control of the scheduling of resources (e.g., cores, cache, memory) within their partitions. Partitions are lightweight, and can be resized or suspended with similar overheads to a process context switch.
- Parallel Computing Laboratory (Par Lab)
- "Tessellation: Space-Time Partitioning in a Manycore Client OS," R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanović and J. Kubiatowicz, HotPar09.