LEGION

http://www.ucl.ac.uk/research-computing/information/services/cluster/#using

LEGION support: rc-support@ucl.ac.uk


MYSCRIPT

#!/bin/bash -l

#PBS -N NAME
#PBS -j oe
#PBS -l walltime=00:04:59
#PBS -l pmem=2048M
#PBS -l pvmem=8GB
#PBS -A ucl/BioinfCompBio/mol_evol

export OMP_NUM_THREADS=1

cd ~/Prank_analysis_23Oct
prank -d=ENSG00000000003 -fixedbranches=0.1 -t=Tree_ENSG00000000003 -o=Out_ENSG00000000003 -nopost -noxml -notree

cp Out_ENSG00000000003 MYRESULTS/Out_ENSG00000000003

* About walltime
http://www.ucl.ac.uk/research-computing/information/resource_allocation
* Add the job to the Legion

qsub MYSCRIPT

* Shell script

#!/bin/sh
cd myscripts
qsub script1; sleep 0.1;
qsub script2; sleep 0.1;
....




Login to node for running

slogin -X usertest11


QSTAT*

qstat | tr -s " " | awk 'BEGIN {count1=0; count2=0; count3=0; count4=0;}{count1++} $3 == "ucbtiju" {Found = 1;} NR > 2 && Found != 1 {count2++}$3 == "ucbtiju" {count3++} $3 == "ucbtiju" && $4 == "0" {count4++} END{print "\n No. of jobs in queue in total: " count1"\n No. above your 1st job in queue: " count2"\n No. of your jobs in queue: " count3"\n No. of your jobs running: " count3-count4 "\n ";}'date


Useful commands


Check the number of files

ls -laR | grep -c '^-'

qstat
To list *every* job in the queue

qstat

To list all of ucbtiju's jobs

qstat | grep ucbtiju

To do either of these with one page at a time which is more useful:

qstat | less
qstat | grep ucbtiju | less

You can check individual jobs using e.g.

checkjob 12467864.qm01

or for a more detailed description:

checkjob -v 12467864.qm01

This will tell you how long you are queueing and why the job hasn't
started yet etc...


Check the log file -why Legion does not perform my all 20,00 jobs?-
Each job you submit produces a log file. õI forget the exact format but
you can find them using:

find mydirectory -type f -name "*.o*"

They will be something like: NAME.o12313677

where 12313677 is the job number from 12313677.qm01 (that you see in
qstat), and the NAME is whatever you called the job in your script.

Anyway, if there are only 4000 of these output files then LEGION did miss
all your jobs, but if there are 20,000 of these files then that probably
means that it was you who made a mistake with a typing error somewhere in
your job e.g. missed a directory out of the path, or a space where it
shouldn't be etc...

Personally I gave up wasting my time looking at what went wrong unless it
happened a lot of times on the same jobs. õI think it is better to write a
little script to see which jobs did or did not run, then to re-submit the
jobs that were missed.

A quick check would be something like:

find mydirectory/myresults -type f -name "*.phy" õ| wc -l

this will tell you how many files of a certain type are in a folder. õThis
can be used to see if you are missing some output files when the jobs seem
to be finished.


The only problem with putting higher walltimes is that your jobs stay in
the queue much longer while they wait for other people. õIf possible it is
best to keep the walltime low and just rerun the jobs that were missed.
If you know that õa job takes 60 mins on tarsier then 90 mins for LEGION
is very safe and 24 hours is overkill!

I find it quicker to run with low walltime and re-run the missed jobs, but
one alternative is to request a large wall time like 24 hours and just run
25-50 prank jobs in each job script. õThis method also works quite well -
although I still prefer to use thousands of small jobs!


Kill job
If you want to kill a job in the queue:

qdel 1052353432.qm01

...you get the number 1052353432.qm01 from looking at qstat.

There are other variations, e.g. 'addding | wc -l'counts the number of
lines, so these two tell you how many jobs in the queue, and how many you
have in the queue:

qstat | wc -l
qstat | grep ucbpwaf | wc -l


Condition of use

http://www.ucl.ac.uk/research-computing/information/services/cluster/user_test