Short course: Introduction to SLURM on Computing Cluster

Short Course Description

This one-day course is designed to provide participants with foundational knowledge and practical skills in using SLURM (Simple Linux Utility for Resource Management), the widely adopted open-source workload manager for high-performance computing (HPC) clusters. Participants will learn how to efficiently submit, monitor, and manage computational jobs on SLURM-managed systems, enabling them to leverage HPC resources effectively for research and development tasks.

Course Content Overview

Session 1

  • Overview of SLURM architecture and components
  • Understanding job scheduling and resource allocation
  • Introduction to SLURM commands: sbatch, srun, squeue, scancel

Session 2:

  • Writing and configuring SLURM job scripts
  • Hands-on exercises: submitting and managing jobs
  • Best practices for job monitoring and troubleshooting

Learning Outcomes

By the end of the course, participants will be able to:

  • Understand the role and functionality of SLURM in HPC environments
  • Submit and manage jobs using SLURM commands and scripts
  • Monitor job status and troubleshoot common issues
  • Optimize resource usage for computational tasks
  • Apply best practices for efficient workload management on HPC clusters

This course is ideal for postgraduate students, researchers, and industry professionals seeking to enhance their computational skills and effectively utilize HPC resources in their work.