Presentation of the new High-Performance-Cluster

# Presentation of the new High-Performance-Cluster

### Patrick Schratz

GIScience Seminar Series, Jena, 17th April 2019

1 Department of Geography, GIScience group, University of Jena <a href="https://www.geographie.uni-jena.de/Geoinformatik.html"></a>

<a href="https://pjs-web.de">https://pjs-web.de</a> &emsp;

<a href="https://twitter.com/pjs_228">@pjs_228</a> &emsp;

<a href="https://github.com/pat-s">@pat-s</a> &emsp;

<a href="https://stackoverflow.com/users/4185785/pat-s">@pjs_228</a> &emsp;

<a href="patrick.schratz@uni-jena.de">patrick.schratz@uni-jena.de</a>&emsp;

<a href="https://www.linkedin.com/in/patrick-schratz/">Patrick Schratz</a>&emsp;

---

# Why a cluster?

- Low(er) maintenance

- Better performance

- Job scheduling

]

---

# Hardware

**RAM:** 126 GB DDR4

**Disk:** 1 TB M2 SSD

**Nodes:** 4 (3 computes) - soon 7 computes.
]

---

# Software

- **OS:** CentOS 7

- **Library Management:** Spack

- **Load Monitoring:** Ganglia

- **Scheduler:** Slurm

- **Cluster Management:** Warewulf

- RStudio Server Pro
]

---

# Why a scheduler?

- Takes care of the execution queue for multiple users

- Forces you to think about CPU and memory usage **before** sending your job

- Distributes jobs across the cluster parallelization on the node level

- Only store your project on one machine - the scheduler will distribute the processing for you
]

---

---

# Vocabular

**Job:** Code to run, e.g. a R script or a single line of code

**Task:** Jobs can consist of multiple tasks. All tasks will have the same settings when submitted via `--array`.

**Compute Node:** One machine in the cluster. Only used for processing.

**Frontend Node:** The "master" node to which you log in.

]

---

# Sending jobs

- via R packages `drake` or `clustermq`

- "Scheduler template" required to tell the scheduler what resources (CPU/Mem) you need
]

---

# Sending jobs

`slurm_clustermq.tmpl`

```sh
#!/bin/sh
#SBATCH --job-name={{ job_name }}
#SBATCH --partition=normal
#SBATCH --output={{ log_file | /dev/null }}
#SBATCH --error={{ log_file | /dev/null }}
#SBATCH --cpus-per-task={{ n_cpus }}
#SBATCH --mem={{ memory }}
#SBATCH --array=1-{{ n_jobs }}
```
]

---

# Sending jobs

- Arguments of the template are passed via 
  - `drake::make(template = list())` or
  - `clustermq::Q(template = list())`

- Required: 
  - Number of CPUS (`n_cpus`)
  - Memory (`memory`)

]

---

# drake vs. clustermq

- `drake` > `clustermq`

- `drake` uses `clustermq` under the hood

- `drake` knows the execution order of all R objects in your project

- `drake` distributes all your analysis in order / parallel to the cluster with ONE R command

]

---

# Libraries

Managed via [Spack](spack.io)

- Uses the concept of "environment modules" unter the hood

- All required libraries need to be specifically loaded after login

- Possibility to load libraries at startup (`~/.bashrc`)

]

---

---

# FAQ

- Can I login / use the nodes standalone? No.

- Is there a user guide? Yes, at https://jupiter.geogr.uni-jena.de/hpc/

- What if I want to report an issue? https://venus.geogr.uni-jena.de/giscience/hpc-user-guide

- Do I have to use `drake` or `clustermq`? Yes, to use the nodes for processing in R there is no other way.

- `drake` is complicated, I do not understand it. There is a manual (https://ropenscilabs.github.io/drake-manual/#) that will help you understand

]