Matthew Li
Biographical Sketch
Matthew is a Computer Systems Engineer in the Integrated Data Systems Group (Scientific Data Division).
He works with the Lab's Science IT team, leading design and development on a web portal and REST API that centralize management of cluster users, projects, and computing quotas on Berkeley Research IT's Savio supercluster and the Lab's Lawrencium supercluster.
He also works with scientists running the MINOS and BDC radiological data services in the Nuclear Science Division's Applied Nuclear Physics program, developing new features, managing deployments, and ingesting data into the systems.
Lastly, he works with the Energy Efficiency Studies group in the Energy Technologies Area, serving as a full-stack engineer on a data collection tool.
He received a B.A. in Computer Science from the University of California, Berkeley in 2018.
Conference Papers
Matthew Li, Nicolas Chan, Viraat Chandra, Krishna Muriki, "Cluster Usage Policy Enforcement Using Slurm Plugins and an HTTP API", Practice and Experience in Advanced Research Computing, New York, NY, USA, Association for Computing Machinery, July 26, 2020, 232–238, doi: 10.1145/3311790.3397341
Managing and limiting cluster resource usage is a critical task for computing clusters with a large number of users. By enforcing usage limits, cluster managers are able to ensure fair availability for all users, bill users accordingly, and prevent the abuse of cluster resources. As this is such a common problem, there are naturally many existing solutions. However, to allow for greater control over usage accounting and submission behavior in Slurm, we present a system composed of: a web API which exposes accounting data; Slurm plugins that communicate with a REST-like HTTP implementation of that API; and client tools that use it to report usage. Key advantages of our system include a customizable resource accounting formula based on job parameters, preemptive blocking of user jobs at submission time, project-level and user-level resource limits, and support for the development of other web and command-line clients that query the extensible web API. We deployed this system on Berkeley Research Computing’s institutional cluster, Savio, allowing us to automatically collect and store accounting data, and thereby easily enforce our cluster usage policy.