SLURM stands for Simple Linux Utility for Resource Management. It essentially allows the users to submit jobs to a queue, and when there is sufficient ressources available (and it is your turn), the job will start. Additionally, SLURM can send you e-mails with progress, save output of your program to a file, write to a log file, and so on. To submit a job, you make a bash-file, and pass it to
job.sh contains the information to be passed to SLURM. The options for SLURM and the code to be run.
sbatch, we would have a file called
my_folder.list with the contents of the folder we were in. Great stuff. Now let’s run a Matlab script (or R, python, Julia, whatever). If we were simply in a terminal, we could do
They are not quite identical, but let’s leave the details be. You should note that the first does not quit matlab after the script is run, but the second does. If we only want to run one script this may be fine, but if we want to run many similar scripts, this is not the best way to do it. We would have to have as many scripts as jobs, and that is a file mess. We could generate them with a script, but we will avoid that.
First, we will have to look at so-called here documents. To understand them, let’s try with a simple example.
wall is a linux program that writes a message to all logged on users. What the above example does, is the following:
<< EndOfMessage tells your computer, that we are going to pass some input to the program
wall, and it is going to be everything that follows until we meet the expression
EndOfMessage if it starts on a new line with no white space around it. The name was chosen by me, and could easily have been
EOM for short, you just don’t want to choose a keyword, which might be used in the particular context - for example
end. Similarly, we could pass a script to Matlab. Simply write the following.
And we will learn that the
C is sadly not
42. However, we also learnt how to use the super useful here document feature. Let us try a different example. We will create a 3x3 matrix, and use an index to choose which column to calculate the mean of.
And we see that it is indeed true! How convenient.
So, I promised something about running similar jobs. This is where the
--array option to
sbatch comes into play. Let us make a SLURM job file as above. This time, it runs our little
mean-script instead of running
ls. The following should be saved in a file, for example
myjobs.sh, and is run by typing
This time I’ve added some new things:
--array as options to
$SLURM_ARRAY_TASK_ID. I’ve also omitted the
-nodesktop option, as the GUI is probably not even installed on the server where you run
--output option adds the possibility for
sbatch to save the output (of Matlab in this instance) to a file. The
%j% refers to the job ID given by SLURM to your job. The
--array option will run our script three times. Each time it runs, it will substitute a value for
$SLURM_ARRAY_TASK_ID. It will start with 1, then 2, and lastly 3. However, you can not be sure that SLURM will actually run them in that order. Checking the
.out files, you should be able to see, that the job with
matrix_index == 3 is the only job which returns
1). This can obviously be used to loop over parameterizations of a model, different estimation methods, and so on.
As a last little note, there is a feature to limit the maximum number of concurrent runs. This is done by typing
The number after
% means that you want to queue the 500 jobs, but it should not start more than 10 at once. This is useful if you want to run a lot of jobs, but you don’t want to clog up the queue for everyone else.