Skip to main content

Requirements

In order to be eligible for the DeepSquare Grid, a cluster needs to meet some requirements:

Control plane requirements

The control plane can be easily deployed by using ClusterFactory, which is our open source cluster manager available on github. ClusterFactory makes the process of deploying a full fledged HPC cluster and joining the grid fast and easy.

Minimum RequirementsRecommended
A LDAP serverDocker: 389ds/dirsrv
The LDAP connectorDocker:
ghcr.io/deepsquare-io/ldap-connector:latest
A SLURM login node, SLURM database with MySQL and SLURM controller with the provider SLURM completion pluginDocker:
ghcr.io/deepsquare-io/slurm:<version>-login-rocky9.2
ghcr.io/deepsquare-io/slurm:<version>-controller-rocky9.2
ghcr.io/deepsquare-io/slurm:<version>-db-rocky9.2
MySQL serverDocker:
bitnami/mariadb
A network shared filesystem serverBeeGFS or any shared file system with RDMA
A DNS serverAny
CVMFS stratum 1Docker:
ghcr.io/squarefactory/cvmfs-server:latest
The DeepSquare SupervisorDocker:
ghcr.io/deepsquare-io/supervisor:latest

Compute plane requirements

We recommend that you boot your operating systems as live images.

Most of the requirements can be met by using the grendel provisioner of ClusterFactory.

Minimum RequirementsRecommended
An x86_64 OS with the Linux KernelSquareOS (qcow2/squashfs), initramfs setup for live image, kernel
SLURM with Pyxis and the provider SPANK plugins(already pre-installed on SquareOS)
From the deepsquare YUM repository.
dnf config-manager --add-repo https://yum.deepsquare.run/yum.repo
dnf install -y pmix4 slurm-contribs slurm-libpmi slurm-pam_slurm slurm-slurmd nvslurm-plugin-pyxis spank-provider
Nvidia GPUs(drivers already pre-installed on SquareOS)
The container runtimes Enroot and Apptainer(already pre-installed on SquareOS)
From the deepsquare YUM repository.
dnf config-manager --add-repo https://yum.deepsquare.run/yum.repo
dnf install -y enroot-hardened enroot-hardened+caps apptainer
An interconnect for all the compute nodes and storage nodes(drivers already pre-installed on SquareOS)
Infiniband or Ethernet with RDMA (At least 100Gbps)
A network shared filesystem client(client already pre-installed on SquareOS)
BeeGFS or any shared file system with RDMA.
Necessary postscripts

Network configuration

The control plane and compute place must share the same subnet.

No port is needed to be opened since the supervisor is using a pull-based strategy to fetch jobs.