= Scheduling GPU resources in the Grid = == Tweaks and applied configuration == === CREAM CE === 1. Added to BLAHP script `/usr/libexec/sge_local_submit_attributes.sh`: {{{ (..) if [ -n $gpu ]; then echo "#$ -l gpu=${gpu}" fi (..) }}} === Scheduler === 1. Complex value 'gpu': {{{ #name shortcut type relop requestable consumable default urgency #------------------------------------------------------------------------------------------- (..) gpu gpu INT <= YES YES 0 0 (..) }}} 1. Host(s) complexes: {{{ hostname tesla.ifca.es load_scaling NONE complex_values gpu=4,mem_free=24G,virtual_free=24G user_lists NONE xuser_lists NONE projects NONE xprojects NONE usage_scaling NONE report_variables NONE }}} 1. Load sensor: {{{ hostname=`uname -n` while [ 1 ]; do read input result=$? if [ $result != 0 ]; then exit 1 fi if [ "$input" == "quit" ]; then exit 0 fi smitool=`which nvidia-smi` result=$? if [ $result != 0 ]; then gpusav=0 gpus=0 else gpustotal=`nvidia-smi -L|wc -l` gpusused=`nvidia-smi |grep "Process name" -A 6|grep -v +-|grep -v \|=|grep -v Usage|grep -v "No running"|wc -l` gpusavail=`echo $gpustotal-$gpusused|bc` fi echo begin echo "$hostname:gpu:$gpusavail" echo end done exit 0 }}} 1. Per-host load sensor: {{{ # qconf -sconf tesla #tesla.ifca.es: load_sensor /nfs4/opt/gridengine/util/resources/loadsensors/gpu.sh }}} == (from UI) Testing == {{{ # cat test_cream.jdl [ JobType = "Normal"; Executable = "foo.sh"; StdOutput="out.out"; StdError="err.err"; InputSandbox={"foo.sh"}; OutputSandbox={"out.out", "err.err" }; OutputSandboxBaseDestUri="gsiftp://localhost"; CERequirements="gpu==2"; ] }}}