Requirements
In order to be eligible for the DeepSquare Grid, a cluster needs to meet some requirements:
Control plane requirements
The control plane can be easily deployed by using ClusterFactory, which is our open source cluster manager available on github. ClusterFactory makes the process of deploying a full fledged HPC cluster and joining the grid fast and easy.
| Minimum Requirements | Recommended |
|---|---|
| A LDAP server | Docker: 389ds/dirsrv |
| The LDAP connector | Docker: ghcr.io/deepsquare-io/ldap-connector:latest |
| A SLURM login node, SLURM database with MySQL and SLURM controller with the provider SLURM completion plugin | Docker: ghcr.io/deepsquare-io/slurm:<version>-login-rocky9.2 |
| MySQL server | Docker: bitnami/mariadb |
| A network shared filesystem server | BeeGFS or any shared file system with RDMA |
| A DNS server | Any |
| CVMFS stratum 1 | Docker: ghcr.io/squarefactory/cvmfs-server:latest |
| The DeepSquare Supervisor | Docker: ghcr.io/deepsquare-io/supervisor:latest |
Compute plane requirements
We recommend that you boot your operating systems as live images.
Most of the requirements can be met by using the grendel provisioner of ClusterFactory.
| Minimum Requirements | Recommended |
|---|---|
| An x86_64 OS with the Linux Kernel | SquareOS (qcow2/squashfs), initramfs setup for live image, kernel |
| SLURM with Pyxis and the provider SPANK plugins | (already pre-installed on SquareOS) From the deepsquare YUM repository. dnf config-manager --add-repo https://yum.deepsquare.run/yum.repo |
| Nvidia GPUs | (drivers already pre-installed on SquareOS) |
| The container runtimes Enroot and Apptainer | (already pre-installed on SquareOS) From the deepsquare YUM repository. dnf config-manager --add-repo https://yum.deepsquare.run/yum.repo |
| An interconnect for all the compute nodes and storage nodes | (drivers already pre-installed on SquareOS) Infiniband or Ethernet with RDMA (At least 100Gbps) |
| A network shared filesystem client | (client already pre-installed on SquareOS) BeeGFS or any shared file system with RDMA. |
| Necessary postscripts |
Network configuration
The control plane and compute place must share the same subnet.
No port is needed to be opened since the supervisor is using a pull-based strategy to fetch jobs.