ACCRE C7 Cluster Quick and Dirty Status
Report generated at Thu Apr 3 03:26:01 AM CDT 2025
Problem Nodes
HOSTNAMES STATE AVAIL_FEATURES TIMESTAMP USER REASON
cn390 down* sandybridge 2025-01-26T08:52:18 slurm Not responding
cn392 down* sandybridge 2025-01-27T22:16:57 slurm Not responding
cn421 down* sandybridge 2025-03-21T05:50:19 slurm Not responding
cn430 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn486 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn912 down* sandybridge 2025-02-28T14:51:14 slurm Not responding
cn1083 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn1091 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn1092 down* sandybridge 2025-03-06T20:58:51 slurm Not responding
cn1096 down* sandybridge 2025-03-06T14:23:48 slurm Not responding
cn1325 drained haswell 2025-03-08T08:45:02 appelte1 Nobody - RT90957 - memory issues, instability
cn1329 drained haswell 2025-03-08T08:47:01 appelte1 Nobody - RT90958 - memory issues, instability
cn1332 down* haswell 2025-03-17T14:57:37 slurm Not responding
cn1350 drained haswell 2025-03-08T08:52:43 appelte1 Nobody - RT90959 - memory issues, instability
cn1356 down* haswell 2025-03-14T21:23:43 slurm Not responding
cn1368 down* haswell 2025-03-17T14:40:56 slurm Not responding
cn1376 drained haswell 2025-03-08T08:54:25 appelte1 Nobody - RT90960 - memory issues, instability
cn1385 drained haswell 2025-03-08T08:56:26 appelte1 Nobody - RT90961 - memory issues, instability
gpu0014 draining broadwell,pascal,p3584 2025-03-31T16:38:53 slurm Prolog error
gpu0015 drained broadwell,pascal,p3584 2025-04-01T11:21:22 slurm Prolog error
gpu0018 drained broadwell,pascal,p3584 2025-03-31T16:41:03 slurm Prolog error
gpu0019 draining broadwell,pascal,p3584 2025-04-01T11:24:22 slurm Prolog error
gpu0021 draining broadwell,pascal,p3584 2025-04-01T11:27:17 slurm Prolog error
gpu0026 draining broadwell,pascal,p3840 2025-04-01T11:29:23 slurm Prolog error
gpu0027 draining broadwell,pascal,p3840 2025-04-01T11:31:30 slurm Prolog error
gpu0030 drained broadwell,pascal,p3840 2025-03-06T15:43:40 root Samuel - RT90936 - Bad gpu
gpu0038 drained skylake,turing,csbtmp 2025-02-17T12:18:42 slurm Nobody - RT90396 - GPU0 in error state : Not respond
gpu0045 drained skylake,turing,csbtmp 2025-02-17T13:00:08 slurm gres/gpu count reported lower than configured (3 < 4
gpu0048 drained skylake,turing,csbtmp 2025-04-01T06:08:57 root Kill task failed
gpu0080 drained* icelake,a6000x4,csbtmp 2024-04-25T13:03:59 root Melo is using this machine for testing
p-matheny-lab- down* zen 2025-03-01T15:59:51 slurm Not responding
Queue Summary (Production)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
accre 0 0 1 2
appelte1 0 0 1 2
-----------------------------------------------------------------------------------------
accre_guests 0 0 2 36
senthia 0 0 2 36
-----------------------------------------------------------------------------------------
anderson_mri 47 188 153 153
xul13 47 188 153 153
-----------------------------------------------------------------------------------------
behringer_lab 0 0 1 8
haleof 0 0 1 8
-----------------------------------------------------------------------------------------
booth_lab 1 1 0 0
comptoab 1 1 0 0
-----------------------------------------------------------------------------------------
brg_cores 13 76 0 0
desilvt 12 60 0 0
kandelr 1 16 0 0
-----------------------------------------------------------------------------------------
caldwell_lab 0 0 1 16
humphrjm 0 0 1 16
-----------------------------------------------------------------------------------------
calipari_lab 0 0 1 18
barthb1 0 0 1 18
-----------------------------------------------------------------------------------------
candelaria_group 0 0 2 40
hatche 0 0 2 40
-----------------------------------------------------------------------------------------
capra_lab_csb 2 2 0 0
mothcw 2 2 0 0
-----------------------------------------------------------------------------------------
cms 0 0 5 5
meloam 0 0 5 5
-----------------------------------------------------------------------------------------
cmsadmin 0 0 1 1
autocms 0 0 1 1
-----------------------------------------------------------------------------------------
cms_lowprio 27 51 182 455
cmslocal 11 29 8 32
cmspilot 16 22 174 423
-----------------------------------------------------------------------------------------
coxlab 22 67 0 0
evansp1 22 67 0 0
-----------------------------------------------------------------------------------------
davis_lab 1 1 0 0
tsail2 1 1 0 0
-----------------------------------------------------------------------------------------
edwards_lab 1 1 0 0
parkerac 1 1 0 0
-----------------------------------------------------------------------------------------
g_gamazon_lab 1 4 1 12
kimn13 0 0 1 12
salerl1 1 4 0 0
-----------------------------------------------------------------------------------------
g_giri_group 5 15 0 0
basnettb 5 15 0 0
-----------------------------------------------------------------------------------------
h_fabbrilab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
h_vangard_1 1 16 0 0
yef1 1 16 0 0
-----------------------------------------------------------------------------------------
h_vmac 0 0 990 990
suny36 0 0 990 990
-----------------------------------------------------------------------------------------
isde-rer 2 6 0 0
vielmej 2 6 0 0
-----------------------------------------------------------------------------------------
jswhep 1 8 0 0
atehort 1 8 0 0
-----------------------------------------------------------------------------------------
l3_manzanas_group 0 0 1 13
manzand 0 0 1 13
-----------------------------------------------------------------------------------------
l3_precision_nutriti 2 13 0 0
baghem1 2 13 0 0
-----------------------------------------------------------------------------------------
l3_wilkey_lab 1 16 7 112
starlii 1 16 7 112
-----------------------------------------------------------------------------------------
maiziezhou_lab 1 34 5 194
chowx 1 34 2 14
xiem6 0 0 3 180
-----------------------------------------------------------------------------------------
nbody 1 32 0 0
smitm77 1 32 0 0
-----------------------------------------------------------------------------------------
palmeri_lab 200 210 0 0
bahgg 200 210 0 0
-----------------------------------------------------------------------------------------
p_collins_lab 0 0 1 8
chencl1 0 0 1 8
-----------------------------------------------------------------------------------------
p_dsi 1 1 0 0
yangi1 1 1 0 0
-----------------------------------------------------------------------------------------
p_matheny_lab 1 2 0 0
koolajd1 1 2 0 0
-----------------------------------------------------------------------------------------
p_neuert_lab 1 8 0 0
hughesjj 1 8 0 0
-----------------------------------------------------------------------------------------
rer 2 32 0 0
hum6 2 32 0 0
-----------------------------------------------------------------------------------------
r_isde 1 16 0 0
trippej1 1 16 0 0
-----------------------------------------------------------------------------------------
rokaslab 11 51 0 0
borrag 1 1 0 0
lint8 10 50 0 0
-----------------------------------------------------------------------------------------
ruderferlab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
sbcs 3 31 41 41
jiag 0 0 41 41
nguyensm 1 10 0 0
pingj2 1 16 0 0
xus13 1 5 0 0
-----------------------------------------------------------------------------------------
taylor_group 2 12 0 0
milesmt 1 8 0 0
schultls 1 4 0 0
-----------------------------------------------------------------------------------------
tk_lab 0 0 1 80
yoonh14 0 0 1 80
-----------------------------------------------------------------------------------------
tong_lab 1 16 0 0
lutherzr 1 16 0 0
-----------------------------------------------------------------------------------------
vgi 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
walker_lab 10 22 0 0
deanrt 9 18 0 0
guox11 1 4 0 0
-----------------------------------------------------------------------------------------
wankowicz_lab 1 1 0 0
wankows 1 1 0 0
-----------------------------------------------------------------------------------------
yang_lab_csb 213 534 0 0
shaoq1 110 110 0 0
shinw3 102 408 0 0
zhangsw 1 16 0 0
-----------------------------------------------------------------------------------------
Totals: 579 1497 1396 2184
Queue Summary (Pascal)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Queue Summary (Turing)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Queue Summary (A6000x4)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Queue Summary (A6000x2)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Partition Summary
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
production* up 14-00:00:0 3 down* cn[1332,1356,1368]
production* up 14-00:00:0 5 drain cn[1325,1329,1350,1376,1385]
production* up 14-00:00:0 29 mix cn[1300,1315,1320,1322,1326,1328,1333-1335,1337-1338,1340,1351,1355,1358-1360,1363,1365,1374,1379-1382,1389,1392,1394,1701,1704]
production* up 14-00:00:0 49 alloc cn[1301-1302,1306-1307,1311,1317-1318,1321,1323-1324,1330-1331,1336,1339,1341-1349,1352-1354,1357,1361-1362,1366-1367,1369-1373,1375,1378,1383-1384,1387-1388,1390-1391,1393,1395-1398]
nogpfs up 14-00:00:0 6 down* cn[390,392,421,912,1092,1096]
nogpfs up 14-00:00:0 4 drain cn[430,486,1083,1091]
nogpfs up 14-00:00:0 139 alloc cn[303-308,311,313,315,317-318,320,322-324,326-333,335-338,340,347-348,351,353,355-360,362-365,367,369-370,372-380,384-385,388-389,394,398-401,403-405,407,411,413-420,423,425,427,429,431-432,435-437,439-443,445-446,448-452,455,460,463-466,468-472,474-477,479,481-485,491,495-496,499,1081-1082,1085-1087,1089-1090,1094-1095,1122-1123,1125-1126,1128-1129,1132,1134]
debug up infinite 2 mix gpu[0022,0059]
debug up infinite 4 idle cn[371,1101],gpu[0006,0046]
sam up 2-02:00:00 2 idle vm-cms-sam-pri,vm-cms-sam-sec
maxwell up 5-00:00:00 1 mix gpu0004
maxwell up 5-00:00:00 2 idle gpu[0002,0012]
pascal up 5-00:00:00 5 drng gpu[0014,0019,0021,0026-0027]
pascal up 5-00:00:00 3 drain gpu[0015,0018,0030]
pascal up 5-00:00:00 9 mix gpu[0013,0017,0020,0022-0023,0025,0031,0033-0034]
turing up 5-00:00:00 3 drain gpu[0038,0045,0048]
turing up 5-00:00:00 2 alloc gpu[0035,0049]
a6000x2 up 5-00:00:00 1 mix gpu0003
a6000x2 up 5-00:00:00 5 idle gpu[0005-0008,0010]
a4000x4 up 14-00:00:0 0 n/a
a6000x4 up 14-00:00:0 1 drain* gpu0080
a6000x4 up 14-00:00:0 6 mix gpu[0059,0077-0079,0081-0082]
a100 up 14-00:00:0 0 n/a
a100x8 up 14-00:00:0 0 n/a
a4000x8 up 14-00:00:0 0 n/a
cgw-vm-qa-flatearth1 up infinite 1 idle vm-qa-flatearth1
cgw-djroomba up infinite 1 idle djroomba
cgw-cqs1 up infinite 1 idle cqs1
cgw-cqs3 up infinite 1 idle cqs3
cgw-rocksteady up infinite 1 idle rocksteady
cgw-tbi01 up infinite 1 idle tbi01
cgw-capra1 up infinite 1 idle capra1
cgw-horus up infinite 1 idle horus
cgw-dsi-gw up infinite 1 idle dsi-gw
cgw-maizie up infinite 1 idle maizie
cgw-maizie2 up infinite 1 idle maizie2
cgw-maizie3 up infinite 1 idle maizie3
cgw-hgen01 up infinite 1 idle hgen01
cgw-p-matheny-lab-server1 up infinite 1 down* p-matheny-lab-server1
cgw-sideshowbob up infinite 1 idle sideshowbob
cgw-platypus up infinite 1 mix platypus
cgw-hanuman up infinite 1 idle hanuman
cgw-lego up infinite 1 idle lego
cgw-badger up infinite 1 idle badger
cgw-candelaria01 up infinite 1 mix candelaria01
cgw-holowatyj01 up infinite 1 idle holowatyj01
cgw-cartailler01 up infinite 1 idle cartailler01
cgw-gamazon01 up infinite 1 idle gamazon01
Queue Summary (All Partitions)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
accre 0 0 1 2
appelte1 0 0 1 2
-----------------------------------------------------------------------------------------
accre_guests 0 0 2 36
senthia 0 0 2 36
-----------------------------------------------------------------------------------------
anderson_mri 47 188 153 153
xul13 47 188 153 153
-----------------------------------------------------------------------------------------
behringer_lab 0 0 1 8
haleof 0 0 1 8
-----------------------------------------------------------------------------------------
bme3890 0 0 3 17
909065 0 0 1 15
dhattk 0 0 2 2
-----------------------------------------------------------------------------------------
booth_lab 1 1 0 0
comptoab 1 1 0 0
-----------------------------------------------------------------------------------------
brg_cores 13 76 0 0
desilvt 12 60 0 0
kandelr 1 16 0 0
-----------------------------------------------------------------------------------------
caldwell_lab 0 0 1 16
humphrjm 0 0 1 16
-----------------------------------------------------------------------------------------
calipari_lab 0 0 1 18
barthb1 0 0 1 18
-----------------------------------------------------------------------------------------
candelaria_group 0 0 2 40
hatche 0 0 2 40
-----------------------------------------------------------------------------------------
capra_lab_csb 2 2 0 0
mothcw 2 2 0 0
-----------------------------------------------------------------------------------------
cgw_candelaria01 3 28 0 0
hatche 1 20 0 0
mcgilldg 1 4 0 0
quintedc 1 4 0 0
-----------------------------------------------------------------------------------------
cgw_djroomba 12 12 0 0
mcphauna 12 12 0 0
-----------------------------------------------------------------------------------------
cgw_maizie 3 72 0 0
wangh67 3 72 0 0
-----------------------------------------------------------------------------------------
cgw_platypus 7 148 0 0
mohamb2 1 32 0 0
rubinom 1 16 0 0
sardarn 5 100 0 0
-----------------------------------------------------------------------------------------
cms 139 1668 40 40
cmslocal 35 420 16 16
cmspilot 104 1248 19 19
meloam 0 0 5 5
-----------------------------------------------------------------------------------------
cmsadmin 0 0 1 1
autocms 0 0 1 1
-----------------------------------------------------------------------------------------
cms_lowprio 27 51 182 455
cmslocal 11 29 8 32
cmspilot 16 22 174 423
-----------------------------------------------------------------------------------------
coxlab 22 67 0 0
evansp1 22 67 0 0
-----------------------------------------------------------------------------------------
cs3892-oguz_acc 0 0 1 1
914505 0 0 1 1
-----------------------------------------------------------------------------------------
csb_gpu_acc 63 142 679 3141
bisigp1 3 12 0 0
cunnik8 23 23 84 84
ger1 0 0 20 20
guox11 8 48 491 2946
howardvr 0 0 1 8
marinot 1 16 0 0
ranx 1 9 0 0
shaoq1 27 34 83 83
-----------------------------------------------------------------------------------------
davis_lab 1 1 0 0
tsail2 1 1 0 0
-----------------------------------------------------------------------------------------
edwards_lab 1 1 0 0
parkerac 1 1 0 0
-----------------------------------------------------------------------------------------
g_gamazon_lab 1 4 1 12
kimn13 0 0 1 12
salerl1 1 4 0 0
-----------------------------------------------------------------------------------------
g_giri_group 5 15 0 0
basnettb 5 15 0 0
-----------------------------------------------------------------------------------------
h_fabbrilab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
h_vangard_1 1 16 0 0
yef1 1 16 0 0
-----------------------------------------------------------------------------------------
h_vmac 0 0 990 990
suny36 0 0 990 990
-----------------------------------------------------------------------------------------
isde-rer 2 6 0 0
vielmej 2 6 0 0
-----------------------------------------------------------------------------------------
jswhep 1 8 0 0
atehort 1 8 0 0
-----------------------------------------------------------------------------------------
l3_manzanas_group 0 0 1 13
manzand 0 0 1 13
-----------------------------------------------------------------------------------------
l3_precision_nutriti 2 13 0 0
baghem1 2 13 0 0
-----------------------------------------------------------------------------------------
l3_wilkey_lab 1 16 7 112
starlii 1 16 7 112
-----------------------------------------------------------------------------------------
maiziezhou_lab 1 34 5 194
chowx 1 34 2 14
xiem6 0 0 3 180
-----------------------------------------------------------------------------------------
nbody 1 32 0 0
smitm77 1 32 0 0
-----------------------------------------------------------------------------------------
nbody_acc 0 0 1 16
khanfm 0 0 1 16
-----------------------------------------------------------------------------------------
neurogroup_acc 1 6 0 0
songrw 1 6 0 0
-----------------------------------------------------------------------------------------
palmeri_lab 200 210 0 0
bahgg 200 210 0 0
-----------------------------------------------------------------------------------------
p_collins_lab 0 0 1 8
chencl1 0 0 1 8
-----------------------------------------------------------------------------------------
p_dsi 1 1 1 4
malikm2 0 0 1 4
yangi1 1 1 0 0
-----------------------------------------------------------------------------------------
p_dsi_acc 0 0 2 12
srikas2 0 0 2 12
-----------------------------------------------------------------------------------------
p_matheny_lab 1 2 0 0
koolajd1 1 2 0 0
-----------------------------------------------------------------------------------------
p_neuert_lab 1 8 0 0
hughesjj 1 8 0 0
-----------------------------------------------------------------------------------------
rer 2 32 0 0
hum6 2 32 0 0
-----------------------------------------------------------------------------------------
r_isde 1 16 0 0
trippej1 1 16 0 0
-----------------------------------------------------------------------------------------
rokaslab 11 51 0 0
borrag 1 1 0 0
lint8 10 50 0 0
-----------------------------------------------------------------------------------------
rubinov_lab_acc 0 0 1 1
mohamb2 0 0 1 1
-----------------------------------------------------------------------------------------
ruderferlab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
sbcs 3 31 41 41
jiag 0 0 41 41
nguyensm 1 10 0 0
pingj2 1 16 0 0
xus13 1 5 0 0
-----------------------------------------------------------------------------------------
taylor_group 2 12 0 0
milesmt 1 8 0 0
schultls 1 4 0 0
-----------------------------------------------------------------------------------------
tk_lab 0 0 1 80
yoonh14 0 0 1 80
-----------------------------------------------------------------------------------------
tong_lab 1 16 0 0
lutherzr 1 16 0 0
-----------------------------------------------------------------------------------------
vgi 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
walker_lab 10 22 0 0
deanrt 9 18 0 0
guox11 1 4 0 0
-----------------------------------------------------------------------------------------
wankowicz_lab 1 1 0 0
wankows 1 1 0 0
-----------------------------------------------------------------------------------------
yang_lab_csb 213 534 0 0
shaoq1 110 110 0 0
shinw3 102 408 0 0
zhangsw 1 16 0 0
-----------------------------------------------------------------------------------------
Totals: 807 3573 2119 5411