Section: Using a cluster | HPC Hackathon materials

Section outline

Dear hackathon attendee,
on Day 2 of the hackathon we will open a survey to get your feedback about the event. The survey is a standard ELIXIR short-term feedback survey and its results will be uploaded to the ELIXIR Training Metrics Database. The survey is anonymous.
- Select activity Connecting to a cluster Software tools Interactive...
  
  Connecting to a cluster
  Software tools
  Interactive work on a remote computer
  We connect to the login nodes with a client that supports the Secure SHell (SSH) protocol. The SSH protocol enables a secure remote connection from one computer to another - it offers several options for user authentication and ensures strong data integrity with strong encryption while working. The SSH protocol is used for interactive work on the login node as well as for data transfer to and from the cluster. In Linux, macOS and Windows 10 operating systems, we can establish a connection from the command line (terminal, bash, powershell, cmd), in which we run the ssh program. In Windows 10 (April 2018 update and newer), the SSH protocol is enabled by default, but in older versions it must be enabled (instructions, howtogeek). For older versions of Windows, we need to install the SSH client separately, one of the more established is PuTTY.
  
  Data transfer
  Secure data transfer from our computer to a remote computer and back also takes place via the SSH protocol, also called SFTP (Secure File Transfer Protocol) or FTP SSH.
  
  Data can be transferred using the scp program (Secure CoPy), in which commands are written to the command line. This program is installed in operating systems together with the ssh program. For easier work, we can use programs with a graphical interface. FileZilla is available for all the mentioned operating systems, and CyberDuck is also very popular for macOS and Windows operating systems.
  
  All-in-one tools
  There are a bunch of combined tools for working on remote systems that include support for interactive work and data transfer. For Windows operating systems, the MobaXterm tool is known, on Linux operating systems we can use Snowflake, and on macOS systems (unfortunately only paid) Termius.
  
  For software developers, we recommend using the Visual Studio Code development environment with the Remote-SSH extension, which is available for all of these operating systems.
  
  Text editors
  We need a file editor to prepare jobs on the cluster. With data transfer programs and all-in-one tools we can edit also files on a cluster. On Linux and macOS, we use a default program, such as Text Editor, to edit simple text files. It gets a little complicated with Windows, which uses a slightly different format. Unlike Linux and macOS, which complete the line with the LF (Line Feed) character, Windows completes the line with the CR (Carriage Return) and LF characters. We prefer not to use Notepad to edit files on a cluster in Windows, but to install Notepad ++. Before saving the file to a cluster in Notepad ++, change the Windows (CR LF) format to Linux (LF) in the lower right corner.
- Select activity Log in to a cluster Run the command line, the sim...
  
  Log in to a cluster
  
  Run the command line, the simplest way is to press a special key on the keyboard (supertype on Linux, command and space keys on macOS or Windows key on Windows), write “terminal” and click on the proposed program. In the command line of the terminal (the window that opens) write:
  
  ssh <name>@nsc-login.ijs.si
  
  and start the program by pressing the input key (Enter key). Enter the name of your SLING SSO user account instead of <name>. If we are working with another login node, we replace the content after the @ sign accordingly.
  
  At your first sign in you will receive this note:
  The authenticity of host 'nsc-login.ijs.si (194.249.156.110)' can't be established.
  ECDSA key fingerprint is SHA256:CuSOLdnvyAQpGxcKMQrgOQfwxSX9R1kcoqawszv5wtA.
  Are you sure you want to continue connecting (yes/no)?
  
  insert yes to add the login node with the specified fingerprint on the PC to known hosts.
  
  Warning: Permanently added 'nsc-login.ijs.si,194.249.156.110' (ECDSA) to the list of known hosts.
  
  After entering the password 'password' for the user account <name>, we find ourselves at the login node, where this command line is waiting for us:
  
  [<name>@nsc-login ~]$
  
  Note
  We will mark the command line at the nodes of the computer cluster only with the $ sign. The $ character at the beginning of a line in these instructions indicates the beginning of a command and we will not copy it. To separate the command from the printout that follows it, a blank line is intentionally added after the command. /
  
  Enter hostname to the command line, and thus run the program on the login node, which tells us the name of the remote computer. This is the same as the name of the login node in our case nsc-login.ijs.si.
  
  Example Command
  
  $ hostname
  nsc-login.ijs.si
  
  hostname
  
  We ran our first program on a computer cluster. Of course not yet exactly right.
  
  Warning
  Administrators of computer clusters are not happy if we run programs incorrectly directly on the login node, thus preventing other users of the system from working smoothly.
  
  To log out of the login node, enter the exit command:
  
  Example Command
  
  $ exit logout Connection to nsc-login.ijs.si closed.
  
  exit
  
  Login without password
  
  Hint
  After the first login, it is desirable to arrange the possibility of logging in without entering a password for security reasons.
  
  Transfer files to and from a cluster
  
  FileZilla
  
  Start the FileZilla program and enter the data in the input fields below the menu bar: Host: sftp: //nsc-login.ijs.si, Username: <name>, Password: <password> and press Quickconnect. Upon login, we confirm that we trust the server. After a successful login, in the left part of the program we see a tree structure of the file system on the personal computer, and on the right the tree structure of the file system on the computer cluster.
  
  CyberDuck
  
  In the CyberDuck toolbar, press the Open Connection button. In the pop-up window in the upper drop-down menu, select the SFTP protocol, and enter the following data: Host: nsc-login.ijs.si, Username:<name>, Password: <password>. Then press the Connect button. Upon login, we confirm that we trust the server. The tree structure of the file system on the computer cluster is displayed.
  
  Clicking on folders (directories) easily moves us through the file system. In both programs we can see the folder we are currently in written above the tree structure, for example /ceph/grid/home/<name>. Right-clicking opens a menu where you can find commands for working with folders (add, rename, delete) and files (add, rename, edit, delete). In FileZilla, files are transferred between the left and right program windows, and in CyberDuck, between the program window and regular folders. The files are easily transferred by dragging and dropping them with the mouse.
  
  Working with files directly on the cluster
  
  You can also enter commands for working with files directly to the command line. Some important ones are:
  
  cd (change directory) move through the file system
  
  cd <folder>: move to the entered folder,
  
  cd ..: move back to the parent folder,
  
  cd: move to the base folder of the user account,
  
  ls (list) printout of the contents of the folder,
  
  pwd (print working directory) the name of the folder we are in,
  
  cp (copy) copy files,
  
  mv (move) move and rename files,
  
  cat <file> display the contents of the file,
  
  nano <file> file editing,
  
  man <command> help for using the command.
  
  Hint
  You can deepen your command line skills on the Ryans Tutorials website.
- Select activity Jobs and tasks in the Slurm systemUsers of compute...
  
  Jobs and tasks in the Slurm system
  Users of computer clusters mostly work with middleware for business monitoring, SLURM (Simple Linux Utility for Resource Management). The Slurm system manages the queue, allocates the required resources to the business and monitors the execution of business. With the Slurm system, users provide access to resources (computing nodes) for a certain period of time, start transactions on them and monitor their implementation.
  
  Jobs
  The user program on the compute nodes is started via the Slurm system. For this purpose, we prepare a transaction in which we state:
  what programs and files we need to run,
  how do we call the program,
  what computer resources do we need to implement,
  time limit for the execution of the transaction and the like.
  A job that runs on multiple cores at the same time is usually divided into tasks.
  
  Tasks life cycle
  Once the job is ready, we send it to the queue. The Slurm system then assigns a JOBID to it and puts it on hold. The Slurm system selects queued jobs based on available computing resources, estimated execution time, and set priority.
  
  When the required resources are available, the transaction starts running. After the execution is complete, the transaction goes through the completing state, when Slurm is waiting for some more nodes, to the completed state.
  
  If necessary, the job can be suspended or canceled. The job may end in a failure due to execution errors, or the Slurm system may terminate it when the timeout expires.
- Select activity Display cluster information Slurm provides a seri...
  
  Display cluster information
  
  Slurm provides a series of commands for working with a cluster. In this section we will look at some examples of using the sinfo, squeue, scontrol, and sacct commands, which serve to display useful information about cluster configuration and status. Detailed information on all commands supported by Slurm can be found on the Slurm project home page.
  
  Sinfo command
  
  The command displays information about the state of the cluster, partitions (cluster parts) and nodes, and the available computing capacity. There are a number of switches with which we can more precisely determine the information we want to print about the cluster (documentation).
  
  Example Command
  
  $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST gridlong* up 14-00:00:0 4 drain* nsc-gsv001,nsc-lou001,nsc-msv001,nsc-vfp002 gridlong* up 14-00:00:0 3 down* nsc-fp003,nsc-gsv003,nsc-msv006 gridlong* up 14-00:00:0 1 drain nsc-vfp001 gridlong* up 14-00:00:0 3 alloc nsc-lou002,nsc-msv[003,018] gridlong* up 14-00:00:0 3 resv nsc-fp[005-006],nsc-msv002 gridlong* up 14-00:00:0 24 mix nsc-fp[002,004,007-008],nsc-gsv[002,004-007],nsc-msv[004-005,007-017,019-020] gridlong* up 14-00:00:0 1 idle nsc-fp001 e7 up 14-00:00:0 2 drain* nsc-lou001,nsc-vfp002 e7 up 14-00:00:0 1 drain nsc-vfp001 e7 up 14-00:00:0 1 alloc nsc-lou002
  
  sinfo
  
  Above we can see which logical partitions are available, their status, the time limit of jobs on each partition, and the lists of computing nodes that belong to them. The printout can be customized with the appropriate switches, depending on what we are interested in.
  
  Example Command
  
  $ sinfo --Node --long Tue Jan 05 11:06:02 2021 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON nsc-fp001 1 gridlong* allocated 16 2:8:1 64200 0 1000 intel,gp none nsc-fp002 1 gridlong* allocated 16 2:8:1 64200 0 1000 intel,gp none nsc-fp003 1 gridlong* allocated 16 2:8:1 64200 0 1000 intel,gp none nsc-fp004 1 gridlong* allocated 16 2:8:1 64200 0 1000 intel,gp none nsc-fp005 1 gridlong* reserved 16 2:8:1 64200 0 1000 intel,gp none nsc-fp006 1 gridlong* reserved 16 2:8:1 64200 0 1000 intel,gp none nsc-fp007 1 gridlong* allocated 16 2:8:1 64200 0 1000 intel,gp none nsc-fp008 1 gridlong* allocated 16 2:8:1 64200 0 1000 intel,gp none nsc-gsv001 1 gridlong* reserved 64 4:16:1 515970 0 1 AMD,bigm none nsc-gsv002 1 gridlong* allocated 64 4:16:1 515970 0 1 AMD,bigm none
  
  sinfo --Node --long
  
  The above printout tells us the following for each computing node in the cluster: which partition it belongs to (PARTITION), what is its state (STATE), number of cores (CPUS), number of processor slots (S), processor cores in slot (C), machine threads (T), the amount of system memory (MEMORY), and any features (AVAIL_FEATURES) attributed to a given node (e.g., processor type, presence of graphics processing units, etc.). Parts of the cluster can be reserved in advance for various reasons (maintenance work, workshops, projects). Example of listing active reservations on the NSC cluster:
  
  Example Command
  
  $ sinfo --reservation RESV_NAME STATE START_TIME END_TIME DURATION NODELIST fri ACTIVE 2020-10-13T13:57:32 2021-10-13T13:57:32 365-00:00:00 nsc-fp[005-006],nsc-gsv001,nsc-msv002
  
  sinfo --reservation
  
  The above printout shows us any active reservations on the cluster, the duration of the reservation and a list of nodes that are part of the reservation. An individual reservation is assigned a group of users who can use it and thus avoid waiting for the completion of transactions of users who do not have a reservation.
  
  Squeue command
  
  In addition to the cluster configuration, we are of course also interested in the state of the job scheduling queue. With the squeue command, we can query for transactions that are currently in the queue, running or have already been successfully or unsuccessfully completed (documentation).
  
  Print the current status of the transaction type:
  
  Example Command
  
  $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 387388 gridlong mc15_14T prdatlas PD 0:00 1 (Priority) 387372 gridlong mc15_14T prdatlas PD 0:00 1 (Resources) 387437 gridlong mc15_14T prdatlas PD 0:00 1 (Priority) 387436 gridlong mc15_14T prdatlas PD 0:00 1 (Priority) 385913 gridlong mc15_14T prdatlas R 15:57:58 1 nsc-msv004 385949 gridlong mc15_14T prdatlas R 13:47:49 1 nsc-msv017
  
  squeue
  
  From the printout we can find out the identifier of an individual job, the partition on which it is running, the name of the job, which user started it and the current status of the job.
  
  Some important jobs conditions
  
  PD (PenDing) - job waiting in line,
  
  R (Running) - job is running
  
  out, CD (CompleteD) - job is completed,
  
  F (Failed) - execution error.
  
  The report also returns information about the total time of the job and the list of nodes on which the job is carried out, or the reason why the job has not yet begun. Usually, we are most interested in the state of jobs that we have started ourselves. The printout can be limited to the jobs of the selected user using the --user switch. Example of listing jobs owned by user gen012:
  
  Example Command
  
  $ squeue --user gen012 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 381650 gridlong pmfuzzy_ gen012 R 7-03:13:06 1 nsc-msv020 381649 gridlong pmfuzzy_ gen012 R 7-03:15:06 1 nsc-msv018 381646 gridlong pmfuzzy_ gen012 R 7-03:18:28 1 nsc-msv008 381643 gridlong pmautocc gen012 R 7-03:25:38 1 nsc-msv017 381641 gridlong pmautocc gen012 R 7-03:28:26 1 nsc-msv007 381639 gridlong pmautocc gen012 R 7-03:32:40 1 nsc-msv004
  
  squeue --user gen012
  
  In addition we can limit the printout to only those jobs that are in a certain state. This is done using the --states switch. Example of a printout of all jobs currently pending execution (PD):
  
  Example Command
  
  $ squeue --states=PD JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 387438 gridlong mc15_14T prdatlas PD 0:00 1 (Priority) 387437 gridlong mc15_14T prdatlas PD 0:00 1 (Priority) 387436 gridlong mc15_14T prdatlas PD 0:00 1 (Priority) 387435 gridlong mc15_14T prdatlas PD 0:00 1 (Priority) 387434 gridlong mc15_14T prdatlas PD 0:00 1 (Resources)
  
  squeue --states=PD
  
  Scontrol command
  
  Sometimes we want even more detailed information about a partition, node, or job. We get them with the scontrol command (documentation). Below are some examples of how to use this command.
  
  Example of printing more detailed information about an individual partition:
  
  Example Command
  
  $ scontrol show partition PartitionName=gridlong AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=YES QoS=N/A DefaultTime=00:30:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=14-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=nsc-msv0[01-20],nsc-gsv0[01-07],nsc-fp0[01-08] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=1856 TotalNodes=35 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=2000 MaxMemPerNode=UNLIMITED TRESBillingWeights=CPU=1.0,Mem=0.25G
  
  scontrol show partition
  
  Example of more detailed information about the nsc-fp005 computing node:
  
  Example Command
  
  $ scontrol show node nsc-fp005 NodeName=nsc-fp005 Arch=x86_64 CoresPerSocket=8 CPUAlloc=0 CPUTot=16 CPULoad=0.00 AvailableFeatures=intel,gpu,k40 ActiveFeatures=intel,gpu,k40 Gres=gpu:2 NodeAddr=nsc-fp005 NodeHostName=nsc-fp005 Version=20.02.5 OS=Linux 5.7.12-1.el8.elrepo.x86_64 #1 SMP Fri Jul 31 16:22:54 EDT 2020 RealMemory=64200 AllocMem=0 FreeMem=56851 Sockets=2 Boards=1 State=RESERVED ThreadsPerCore=1 TmpDisk=0 Weight=1000 Owner=N/A MCS_label=N/A Partitions=gridlong BootTime=2020-11-09T12:54:17 SlurmdStartTime=2020-12-01T10:36:37 CfgTRES=cpu=16,mem=64200M,billing=16,gres/gpu=2 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
  
  scontrol show node nsc-fp005
  
  Sacct command
  With the sacct command, we can find out more information about completed jobs and jobs in progress. For example, for a selected user, we can check the status of all tasks over a period of time.
  
  Example Command
  
  $ sacct --starttime '2021-01-10' --enddtime '2021-01-13' --user <name> 391812 ffmpeg gridlong fri-users 1 CANCELLED+ 0:0 391812.exte+ extern fri-users 1 COMPLETED 0:0 391812.0 ffmpeg fri-users 1 FAILED 127:0 391813 ffmpeg gridlong fri-users 1 COMPLETED 0:0 391813.exte+ extern fri-users 1 COMPLETED 0:0 391813.0 ffmpeg fri-users 1 COMPLETED 0:0 391825 ffmpeg gridlong fri-users 1 COMPLETED 0:0 391825.exte+ extern fri-users 1 COMPLETED 0:0 391825.0 ffmpeg fri-users 1 COMPLETED 0:0
  
  sacct --starttime '2021-01-10' --enddtime '2021-01-13' --user <name>
- Select activity Starting jobs on a cluster In this chapter we wil...
  
  Starting jobs on a cluster
  
  In this chapter we will look at the srun, sbatchn and sallocn commands to start jobs and the scanceln command to cancel the job.
  
  Srun command
  
  The simplest way is with the srun command. The command is followed by various switches with which we determine the quantity and type of machine resources that our business needs and various other settings. A detailed explanation of all the options available is available at the link (documentation). We will take a look at some of the most basic and most commonly used.
  
  To begin with we will run a simple system program hostname as our job, which displays the name of the node on which it runs. Example of starting the hostname program on one of the compute nodes:
  
  Example Command
  
  $ srun --ntasks=1 hostname nsc-msv002.ijs.si
  
  srun --ntasks=1 hostname
  
  We used the --ntasks = 1 switch on the command line. With it we say that our business consists of a single task; we want a single instance of the hostname program to run. Slurm automatically assigns us one of the processor cores in the cluster and performs jobs on it.
  
  In the next step, we can try to run several tasks within our job:
  
  Example Command
  
  $ srun --ntasks=4 hostname nsc-msv002.ijs.si nsc-msv002.ijs.si nsc-msv002.ijs.si nsc-msv002.ijs.si
  
  srun --ntasks=4 hostname
  
  We immediately notice the difference in the printout. Now, four of the same tasks have been performed within our job. They were performed on four different processor cores located on the same computing node (nsc-msv002.ijs.si).
  
  Of course, our tasks can also be divided between several nodes.
  
  Our job can always be terminated by pressing Ctrl + C during execution.
  
  Sbatch command
  
  The downside of the srun command is that it blocks our command line until our job is completed. In addition, it is awkward to run more complex transactions with a multitude of settings. In such cases, we prefer to use the sbatch command, writing the job settings and individual tasks within our job to the bash script file.
  
  #!/bin/bash #SBATCH --job-name=ime_mojega_posla #SBATCH --partition=gridlong #SBATCH --ntasks=4 #SBATCH --nodes=1 #SBATCH --mem-per-cpu=100MB #SBATCH --output=moj_posel.out #SBATCH --time=00:01:00 srun hostname
  
  We have an example of such a script in the box above. At the top of the script we have a comment #! /bin/bash that tells the command line that it is a bash script file. This is followed by line-by-line settings of our job, which always have the prefix #SBATCH. W e have already seen how to determine the reservation, the number of tasks and the number of nodes for our business (--reservation, --ntasks and --nodes) with the srun command.
  
  Let's explain the other settings:
  
  --job-name = my_ job_name: the name of the job that is displayed when we make a query using the squeue command,
  
  --partition = gridlong: the partition within which we want to run our job (there is only one partition on the NSC cluster, so we can also omit this setting),
  
  --mem-per-cpu = 100MB: amount of system memory required by our job for each task (looking at the processor core),
  
  --output = my_job.out: the name of the file in which the content that our job would print to standard output (screen) is written,
  
  --time = 00: 01: 00: time limit of our job in hour: minute: second format.
  
  This is followed by the launch of our job, which is the same as in the previous cases (hostname).
  
  Save the content in the box above to a file, such as job.sh, and run the job:
  
  Example Command
  
  $ sbatch ./job.sh Submitted batch job 387508
  
  sbatch ./job.sh
  
  We can see that the command printed out our job identifier and immediately gave us back control of the command line. When the job is completed (we can check when with the help of the squeue command), the file my_job.out will be created in the current folder, in which the result of the execution will be displayed.
  
  Example Command
  
  $ cat ./my_job.out nsc-msv002.ijs.si nsc-msv002.ijs.si nsc-msv002.ijs.si nsc-msv002.ijs.si
  
  cat ./my_job.out
  
  Scancel command
  
  Jobs started with the sbatch command can be terminated with the scancel command during execution. We only need to specify the appropriate job identifier (JOBID).
  
  Example Command
  
  $ scancel 387508
  
  scancel 387508
  
  Salloc command
  The third way to start a job is with the salloc command. It is used to predict how many computing capacities we will need for our tasks, and then we run jobs directly from the command line with the srun command. The advantage of using the salloc command is that when starting business with srun, we do not have to wait for free capacities every time. The salloc command also uses the same configuration switches as the srun and sbatch commands. If we reserve resources with the salloc command, then we do not need to constantly specify all the requirements for the srun command. Example of running two instances of hostname on one node:
  
  Example Command
  
  $ salloc --time=00:01:00 --ntasks=2 salloc: Granted job allocation 389035 salloc: Waiting for resource configuration salloc: Nodes nsc-msv002 are ready for job $ srun hostname nsc-msv002.ijs.si nsc-msv002.ijs.si $ exit
  
  salloc --time=00:01:00 --ntasks=2 srun hostname exit
  
  When using the salloc command, srun works similarly to using sbatch. With its help, we run our tasks on the already acquired computing capacities. The acquired calculation capacities are released by running exit at the command line after the end of execution. The salloc command offers us another interesting option for exploiting computing nodes. With it, we can obtain capacities on a computing node, connect to it via the SSH protocol and then execute commands directly on the node.
  
  Example Command
  
  $ salloc --time=00:05:00 --ntasks=1 salloc: Granted job allocation 389039 salloc: Waiting for resource configuration salloc: Nodes nsc-fp005 are ready for job $ ssh nsc-fp005 $ hostname nsc-fp005.ijs.si $ exit $ exit
  
  salloc --time=00:05:00 --ntasks=1 ssh nsc-fp005 hostname exit exit
  
  On the nsc-fp005 node, we run the hostname program, which displays the node name. After completing the work on the computing node, we return to the login node using the exit command. Here we perform the exit again to release the capacities we have occupied with the salloc command.
- Select activity Modules and containers  Ordinary users (ie n...
  
  Modules and containers
  
  Ordinary users (ie non-administrators) cannot install programs on the system. Installation must be arranged with the cluster administrator. You can always compile all the necessary software yourself and install it in your home directory, but this is a rather time-consuming and annoying task. In this section, we look at two approaches to make it easier to load a variety of software packages that we often use in supercomputing.
  
  Environment modules
  
  The first approach is environment modules, which include selected user software. Modules are usually prepared and installed by an administrator, who also includes them in the module catalog. The user can then turn the modules on or off with the module load or module unload commands. Different modules can also contain versions of the same program, for example with and without support for graphics accelerators. A list of all modules is obtained with the commands module avail and module spider.
  
  We will need the FFmpeg module in the workshop. This is already installed on the NSC cluster, we just need to load it:
  
  Example Command
  
  $ module load FFmpeg
  
  module load FFmpeg
  
  Use the module list command to see which modules have been loaded.
  
  Example Command
  
  $ module list Currently Loaded Modules: 1) GCCcore/9.3.0 5) x264/20191217-GCCcore-9.3.0 9) expat/2.2.9-GCCcore-9.3.0 13) fontconfig/2.13.92-GCCcore-9.3.0 17) FriBidi/1.0.9-GCCcore-9.3.0 2) NASM/2.14.02-GCCcore-9.3.0 6) ncurses/6.2-GCCcore-9.3.0 10) libpng/1.6.37-GCCcore-9.3.0 14) xorg-macros/1.19.2-GCCcore-9.3.0 18) FFmpeg/4.2.2-GCCcore-9.3.0 3) zlib/1.2.11-GCCcore-9.3.0 7) LAME/3.100-GCCcore-9.3.0 11) freetype/2.10.1-GCCcore-9.3.0 15) libpciaccess/0.16-GCCcore-9.3.0 4) bzip2/1.0.8-GCCcore-9.3.0 8) x265/3.3-GCCcore-9.3.0 12) util-linux/2.35-GCCcore-9.3.0 16) X11/20200222-GCCcore-9.3.0
  
  module list
  
  Containers
  
  The disadvantage of modules is that they must be prepared and installed by an administrator. If this is not possible, we can choose another approach and package the program we need in a Singularity container. Such a container contains our program and all other programs and program libraries that it needs to function. You can create it on any computer and then copy it to a cluster.
  
  When we have the appropriate container ready, we use it by always writing singularity exec <container> before the desired command. The FFmpeg container (ffmpeg_apline.sif file) is available here. Transfer it to a cluster and run:
  
  Example Command
  
  $ singularity exec ffmpeg_alpine.sif ffmpeg -version
  
  singularity exec ffmpeg_alpine.sif ffmpeg -version
  
  A printout with information about the ffmpeg software version is displayed. Use the singularity program to run the ffmpeg_alpine.sif container. Then run the ffmpeg program in the container.
  
  You can also build the ffmpeg_alpine.sif container yourself. Searching the web with the keywords ffmpeg, container and docker, probably brings us to the website https://hub.docker.com/r/jrottenberg/ffmpeg/ with a multitude of different containers for ffmpeg. We choose the current version of the smallest, ready for Alpine Linux, which we build right on the login node.
  
  More detailed instructions for preparing containers can be found at https://sylabs.io/guides/3.0/user-guide/.
  
  Quite a few frequently used containers are available in clusters to all users:
  
  on the NSC cluster they are found in the / ceph / grid / singularity-images folder,
  on the Maister and Trdina clusters in the / ceph / sys / singularity folder.

Previous sectionIntroduction

Next sectionVideo processing

Course HPC Hackathon materials: Using a cluster

Section outline

Connecting to a cluster

Software tools

Interactive work on a remote computer

Data transfer

All-in-one tools

Text editors

Log in to a cluster

Note

Warning

Login without password

Hint

Transfer files to and from a cluster

FileZilla

CyberDuck

Working with files directly on the cluster

Hint

Jobs and tasks in the Slurm system

Jobs

Tasks life cycle

Display cluster information

Sinfo command

Squeue command

Some important jobs conditions

Scontrol command

Sacct command

Starting jobs on a cluster

Srun command

Sbatch command

Scancel command

Salloc command

Modules and containers

Environment modules

Containers