SLURM
SLURM stands for Simple Linux Utility for Resource Management. It essentially allows the users to submit jobs to a queue, and when there is sufficient ressources available (and it is your turn), the job will start. Additionally, SLURM can send you e-mails with progress, save output of your program to a file, write to a log file, and so on. To submit a job, you make a bash-file, and pass it to sbatch
.
Where job.sh
contains the information to be passed to SLURM. The options for SLURM and the code to be run.
After running sbatch
, we would have a file called my_folder.list
with the contents of the folder we were in. Great stuff. Now let’s run a Matlab script (or R, python, Julia, whatever). If we were simply in a terminal, we could do
They are not quite identical, but let’s leave the details be. You should note that the first does not quit matlab after the script is run, but the second does. If we only want to run one script this may be fine, but if we want to run many similar scripts, this is not the best way to do it. We would have to have as many scripts as jobs, and that is a file mess. We could generate them with a script, but we will avoid that.
Here documents
First, we will have to look at so-called here documents. To understand them, let’s try with a simple example.
wall
is a linux program that writes a message to all logged on users. What the above example does, is the following: << EndOfMessage
tells your computer, that we are going to pass some input to the program wall
, and it is going to be everything that follows until we meet the expression EndOfMessage
if it starts on a new line with no white space around it. The name was chosen by me, and could easily have been EOM
for short, you just don’t want to choose a keyword, which might be used in the particular context - for example end
. Similarly, we could pass a script to Matlab. Simply write the following.
And we will learn that the mean
of C
is sadly not 42
. However, we also learnt how to use the super useful here document feature. Let us try a different example. We will create a 3x3 matrix, and use an index to choose which column to calculate the mean of.
And we see that it is indeed true! How convenient.
SLURM arrays
So, I promised something about running similar jobs. This is where the --array
option to sbatch
comes into play. Let us make a SLURM job file as above. This time, it runs our little mean
-script instead of running ls
. The following should be saved in a file, for example myjobs.sh
, and is run by typing sbatch myjobs.sh
.
This time I’ve added some new things: --output
and --array
as options to sbatch
, %j
, and $SLURM_ARRAY_TASK_ID
. I’ve also omitted the -nodesktop
option, as the GUI is probably not even installed on the server where you run sbatch
. The --output
option adds the possibility for sbatch
to save the output (of Matlab in this instance) to a file. The %j%
refers to the job ID given by SLURM to your job. The --array
option will run our script three times. Each time it runs, it will substitute a value for $SLURM_ARRAY_TASK_ID
. It will start with 1, then 2, and lastly 3. However, you can not be sure that SLURM will actually run them in that order. Checking the .out
files, you should be able to see, that the job with matrix_index == 3
is the only job which returns output
as true
(or 1
). This can obviously be used to loop over parameterizations of a model, different estimation methods, and so on.
As a last little note, there is a feature to limit the maximum number of concurrent runs. This is done by typing
The number after %
means that you want to queue the 500 jobs, but it should not start more than 10 at once. This is useful if you want to run a lot of jobs, but you don’t want to clog up the queue for everyone else.