By Philipp Reusch and Foad Vafaei
Organizations have been described as factories that produce judgments and decisions. With the advent of Machine Learning, businesses and organizations have come to understand the need for a better analysis and visualization of large data-sets. They need to explore, analyze, and visualize massive amounts of data to make decisions more rapidly and at a lower cost. They need to manage the entire lifecycle of Machine Learning models. ML has paved the way for the next generation of organizational productivity. ML has fast become a strategic platform play for large enterprises.
Currently, it is customary to conduct experiments and light developments in the cloud. But AI development requires a huge compute power; AI models can have many parameters. Data-sets can be very large, taking a long time to upload to the cloud. Organizations may not want to upload the data-sets to the cloud because of privacy laws, or they may not want to be locked in to one particular vendor’s technology. They can train their ML models on in-house HPC and cluster systems, with special processing units like GPU cards or specialized AI processors.
AI developers who need to access lots of compute power may have big workstations under their desk with multiple Graphical Processing Units–this results in huge hardware costs. And the administration of these decentralized workstations is not trivial. Furthermore, AI developers need to collaborate on large data-sets, sharing their trained models and the results of their experiments.
Carme is a scalable easy-to-use open-source framework for High Performance Computing (HPC) clusters with support for accelerators like Graphics Processing Units (GPU) or Field-Programmable Gate Array (FPGA). Carme enables users to rapidly develop, train, and deploy Artificial Intelligence models in an interactive way. Carme helps businesses to get more out of their investment in HPC systems so they can deliver truly real-time intelligence on large complex data-sets. That means more productive data scientists and data engineers.
Carme is already in use for AI research at the Fraunhofer ITWM, where our researchers are focused on using JupyterLab (or Notebook) remotely through the Carme Web Interface from their laptop or workstation to work on their AI algorithms.
AI developers are often very good at AI development. But the AI industry is huge; so they will not be able to also know everything about high performance computing — how to access the resources, to schedule an interactive batch job, or to more optimally utilize the HPC hardware.
To solve these problems Carme helps AI projects in four important ways:
I- Carme helps your team get more out of your investment in HPC systems and GPU Clusters, so that data scientists can deliver real-time intelligence on large complex data-sets, without significant investment in new personnel and equipment, and system administrators can better manage the existing infrastructure.
II- Carme helps optimize hardware utilization by sharing resources among users, enabling data scientists to submit batch jobs and collaborate with each other without stepping on each-others’ toes or losing work that is in progress. Carme also reduces data scientist learning curve by providing multiple layers of abstraction and well-known tools and libraries and enable ML organizations to deliver secure dependable easy-to-manage team collaboration platform for data scientists.
III- Carme offers a multi-user web-interface for reducing administrative costs and enabling teams of data scientists to collaborate on improving their algorithms through a Graphic UI for better visibility and insight into the state of HPC cluster system resources utilization with a single view and in real-time.
IV- Carme simplifies interactions with HPC and cluster systems through meaningful abstractions, so that data scientists can focus on developing algorithms through JupyterLab and Theia IDE. When you use JupyterLab, you can write single code parts and interactively see the output of the code that you wrote to visualize your input data and your output. So, we use JupyterLab as an interface to HPC clusters.
To sum up, Carme enables data scientists to connect, interact, and collaborate regardless of location. Carme will also increase the business value of your HPC and cluster infrastructure by deploying a comprehensive multi-user web-based work environment that generates high return on investment.
Carme is open source and available via GitHub. We had our first release in mid-2019 and our main focus is now on the next stable release in mid-2021. We aim to provide a more productive new GUI interface which will enable data scientists to integrate all of the parts of the multi-user management into this project view which then hopefully simplifies even more the process of working on their AI algorithms.
To learn more about Carme, see https://open-carme.org/