Custom jobs are submitted but nothing happens

  • Posted on: 7 July 2020
  • By: cudroiu

When a custom job is submitted it remains in the Submitted or Running state but no processor usage is indicated in the "System overview" tab. Also, steps in the job output from "Monitoring" tab do not indicate any progress in execution.

Also, executing the following commands result in a timeout from Slurm srun command:

sudo su -l sen2agri-service
srun ls -al

which returns something like:

srun: Required node not available (down, drained or reserved)
srun: job 3257 queued and waiting for resources

 

This usually happens when during high level products processing (L3X, L4X), the root partition (/) became full and no disk space was available on this partition.

In this case, the Slurm does not execute any commands and it should be re-initiated.

sudo -u sen2agri-service scontrol update NodeName=localhost State=RESUME
sudo systemctl restart slurmd slurmdbd slurmctld mariadb

The succes of the operation can be checked again by :

sudo su -l sen2agri-service
srun ls -al

which should display now the list of files in the current directory.