Requirements
In order to be eligible for the DeepSquare Grid, a cluster needs to meet some requirements:
Control plane requirements
The control plane can be easily deployed by using ClusterFactory, which is our open source cluster manager available on github. ClusterFactory makes the process of deploying a full fledged HPC cluster and joining the grid fast and easy.
Minimum Requirements | Recommended |
---|---|
A LDAP server | Docker: 389ds/dirsrv |
The LDAP connector | Docker: ghcr.io/deepsquare-io/ldap-connector:latest |
A SLURM login node, SLURM database with MySQL and SLURM controller with the provider SLURM completion plugin | Docker: ghcr.io/deepsquare-io/slurm:<version>-login-rocky9.2 |
MySQL server | Docker: bitnami/mariadb |
A network shared filesystem server | BeeGFS or any shared file system with RDMA |
A DNS server | Any |
CVMFS stratum 1 | Docker: ghcr.io/squarefactory/cvmfs-server:latest |
The DeepSquare Supervisor | Docker: ghcr.io/deepsquare-io/supervisor:latest |
Compute plane requirements
We recommend that you boot your operating systems as live images.
Most of the requirements can be met by using the grendel provisioner of ClusterFactory.
Minimum Requirements | Recommended |
---|---|
An x86_64 OS with the Linux Kernel | SquareOS (qcow2/squashfs), initramfs setup for live image, kernel |
SLURM with Pyxis and the provider SPANK plugins | (already pre-installed on SquareOS) From the deepsquare YUM repository. dnf config-manager --add-repo https://yum.deepsquare.run/yum.repo |
Nvidia GPUs | (drivers already pre-installed on SquareOS) |
The container runtimes Enroot and Apptainer | (already pre-installed on SquareOS) From the deepsquare YUM repository. dnf config-manager --add-repo https://yum.deepsquare.run/yum.repo |
An interconnect for all the compute nodes and storage nodes | (drivers already pre-installed on SquareOS) Infiniband or Ethernet with RDMA (At least 100Gbps) |
A network shared filesystem client | (client already pre-installed on SquareOS) BeeGFS or any shared file system with RDMA. |
Necessary postscripts |
Network configuration
The control plane and compute place must share the same subnet.
No port is needed to be opened since the supervisor is using a pull-based strategy to fetch jobs.