When jobs are submitted, researchers can monitor their status using Slurm commands. Additionally, researchers can get information about completed jobs regarding their CPU and memory usage during execution for planning future jobs. Both of these cases should be a regular part of using Cheaha for researchers.
In case jobs were submitted by accident or the code was written incorrectly, they can also be cancelled.
Currently running jobs can be monitored using the
squeue command. The basic command to list all jobs for a specific researcher is:
The output of
squeue will look like:
By default the fields displayed are
name, blazerid as
user, job state as
st, total run time as
time, number of nodes as
node, and the list of nodes as
nodelist, used for each job a researcher has submitted.
For array jobs, the job id will be formatted as
Further information about filtering by job name or partition, including information about memory or number of CPUs, and info regarding messages specific to a job's status can be seen using
Cancelling queued and currently running jobs can be done using the
scancel command. Importantly, this will only cancel jobs that were initiated by the researcher running the command.
scancel is very flexible in how it behaves:
Cancelling all jobs will also cancel the interactive jobs created on the Open OnDemand portal.
More information on options to cancel jobs can be seen using
Reviewing Past Jobs¶
If you are planning a new set of jobs and are estimating resource requests, it is useful to review similar jobs that have already completed. To list past jobs for a researcher, use the
To minimize queue wait times and make best use of resources, please review job efficiency using
seff. See our Job Efficiency page for more information.
Review With Job ID¶
The basic form is to use
-j along with a job ID to list information about that job.
This command will output basic information such as the ID, Name, Partition, Allocated CPUs, and State for the given job ID.
Jobs can have matching extern and/or batch job entries as well. These are not especially helpful for most researchers. You can remove these entries using:
Review Jobs Submitted Between Specific Timepoints¶
If you do not remember the job ID, you can use the
-E flags to retrieve jobs submitted between the given start datetime and end datetime. Valid start/end time formats are:
- Anything in
- Times can be specified in either 12-hour with AM/PM or 24-hour
- For the last specification, the T itself is inserted, it is not replaced with any value. For example, requesting jobs starting after 12:30 PM on October 5, 2021, the form would be
For example, to retrieve jobs submitted during the month of July 2021, the command could be:
Customizing the Output¶
You can add
-o with a list of output fields to customize the information you see.
This command will output the job ID, the start time, end time, the state, the number of allocated CPUs, and the requested memory for the specified job. All potential output fields can be seen using
sacct --helpformat. Their descriptions can be found on the sacct documentation under Job Accounting Fields.