There may be a number of reasons why a submitted job does not start running. When that happens, it is a good idea to use squeue and pay attention to the STATE and NODELIST(REASON) columns:
$> squeue -j 64243399 
    JOBID       NAME  USER   QOS    STATE       TIME TIME_LIMIT NODES      FEATURES NODELIST(REASON)
 64243399     my_job  user    nf  PENDING       0:00   03:00:00     1        (null) (Priority)

If the job is in a PENDING state, it means it has not been dispatched to any available node to run. Check the reason why this happens.

Here is a list of the most common ones:

ReasonDescriiption
PriorityYour job is ready to be dispatched, but there are other jobs with more priority which will be dispatched before yours.
ResourcesYour job is ready to be dispatched and it is at the top of the queue, but there are no free resources to satisfy your job requirements.
AssocMaxJobsLimit

You have reached a limit in the number of jobs you can submit to the system in a given project account. Your job will not be considered until your other jobs in the same project complete.

QOSMaxJobsPerUserLimit

You have reached a limit in the number of jobs you can submit to a given QoS. Your job will not be considered until your other jobs in the same QoS complete.

JobArrayTaskLimit

Your job is part of an array job and the job array's limit on the number of simultaneously running tasks has been reached. Your job will not be considered until your other jobs in the same array job complete.

Dependency

Your job depends on others to complete. Your job will not be considered until dependent jobs complete.

DependencyNeverSatisfied

Your job has a dependency on another job that will never be satisfied. You should assess why that is and cancel the job as required.

ReqNodeNotAvail

There are no nodes available to dispatch your job. A System Session or outage may be going on. Check our service status on https://www.ecmwf.int/en/service-status

LicensesYour job requires some resources that are temporarily not available. A System Session or outage may be going on. Check our service status on https://www.ecmwf.int/en/service-status

The full list of reasons can be found in the squeue man page

man squeue


  • No labels