ACCRE R9 Cluster Quick and Dirty Status
Report generated at Wed Mar 18 10:23:01 PM CDT 2026
Problem Nodes
HOSTNAMES STATE TIMESTAMP REASON COMMENT
cn1269 drng 2026-03-18T21:47:48 Kill task failed (Jo (null)
cn1278 down* 2026-03-17T15:06:51 Not responding Alex - rt97645 - replace dimm a1
cn1281 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1317 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1393 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1438 down 2026-03-17T14:32:22 Node unexpectedly re Alex - RT97650 - CMOS Battery replace
cn1439 drain 2026-03-17T14:52:30 Prolog error Alex - na - replace CMOS battery
cn1440 down* 2026-03-17T14:31:37 Not responding Alex - na - replace CMOS battery
cn1441 drain 2026-03-17T14:52:37 Prolog error Alex - na - replace CMOS battery
cn1449 drng 2026-03-18T10:50:31 Kill task failed (Jo (null)
cn1534 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1548 drain 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1551 drain 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1553 drain 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1561 drain 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1576 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1594 drng 2026-03-18T16:06:53 Kill task failed (Jo (null)
cn1606 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1607 drng 2026-03-18T18:12:26 Kill task failed (Jo (null)
cn1609 drng 2026-03-18T21:51:12 Kill task failed (Jo (null)
cn1627 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1629 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1702 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1705 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
cn1706 drng 2026-03-18T08:46:33 Thomas - RT98027 - W Thomas - RT98027 - When drained, reboot, resume.
dgx02 drain* 2026-02-25T10:32:14 Scott - RT96990 - re Scott - RT96990 - read only filesystem, hot ssd
gpu0027 drain* 2026-03-17T11:12:19 Troy - RT97521 - dec Troy - RT97521 - decom
gpu0033 drain* 2026-03-17T14:49:34 Troy - RT97716 - dec Troy - RT97716 - decom
gpu0045 inval 2026-03-11T09:14:33 gres/gpu count repor Samuel - RT97477 - first round of triage done
gpu0050 inval 2026-03-11T09:14:33 gres/gpu count repor Samuel - RT- - RT97518 - first round of triage done
gpu0200 inval 2026-03-17T14:48:03 gres/gpu count repor (null)
gpu0201 inval 2026-03-17T20:07:11 gres/gpu count repor (null)
gpu0202 inval 2026-03-13T16:21:03 gres/gpu count repor Nobody - RT97910 - RMA JBOX caddy
Queue Summary (Batch)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
accre 1 1 1 1
binklemj 0 0 1 1
frederse 1 1 0 0
-----------------------------------------------------------------------------------------
accre_guests 0 0 1 100
haojz 0 0 1 100
-----------------------------------------------------------------------------------------
beam_lab 3 48 0 0
zhuj29 3 48 0 0
-----------------------------------------------------------------------------------------
behringer_lab 1 62 0 0
haleof 1 62 0 0
-----------------------------------------------------------------------------------------
bias_group 1 4 0 0
biasds 1 4 0 0
-----------------------------------------------------------------------------------------
booth_lab 1 4 0 0
mathura 1 4 0 0
-----------------------------------------------------------------------------------------
brg_cores 1 16 0 0
kandelr 1 16 0 0
-----------------------------------------------------------------------------------------
cgg 0 0 1 64
liy110 0 0 1 64
-----------------------------------------------------------------------------------------
cms 227 3817 367 927
cmslocal 83 1968 158 429
cmspilot 139 1775 207 486
uscmslocal 5 74 2 12
-----------------------------------------------------------------------------------------
coxlab 2 11 0 0
hirbojb1 1 3 0 0
scalica 1 8 0 0
-----------------------------------------------------------------------------------------
cqs_si 0 0 4 8
chenarsw 0 0 4 8
-----------------------------------------------------------------------------------------
cs8395-oguz 3 12 0 0
palermpd 3 12 0 0
-----------------------------------------------------------------------------------------
das_lab 1 1 0 0
shiltmh1 1 1 0 0
-----------------------------------------------------------------------------------------
davis_lab 0 0 1 16
bluejor 0 0 1 16
-----------------------------------------------------------------------------------------
econ_faculty 1 2 0 0
moroa 1 2 0 0
-----------------------------------------------------------------------------------------
econgrads 1 1 0 0
chenl40 1 1 0 0
-----------------------------------------------------------------------------------------
g_benntor_lab 1 16 0 0
mccorcl1 1 16 0 0
-----------------------------------------------------------------------------------------
g_gamazon_lab 0 0 1 1
evanspd1 0 0 1 1
-----------------------------------------------------------------------------------------
g_giri_group 1 1 0 0
triozjl1 1 1 0 0
-----------------------------------------------------------------------------------------
goldring_group 0 0 1 2
mcgrawke 0 0 1 2
-----------------------------------------------------------------------------------------
haslag_group 4 64 100 1600
haslagph 4 64 100 1600
-----------------------------------------------------------------------------------------
h_biostat_kang 500 500 301 301
yanb1 500 500 301 301
-----------------------------------------------------------------------------------------
h_biostat_student 287 341 0 0
chenh36 1 30 0 0
koy2 1 15 0 0
namy1 251 251 0 0
shil10 1 1 0 0
yangc16 32 32 0 0
yih4 1 12 0 0
-----------------------------------------------------------------------------------------
h_cqs 1 1 10 31
shengq1 1 1 10 31
-----------------------------------------------------------------------------------------
h_darby_lab 1 4 0 0
leej133 1 4 0 0
-----------------------------------------------------------------------------------------
hodges_lab 0 0 1 3
aganve 0 0 1 3
-----------------------------------------------------------------------------------------
h_vangard_1 1 3 2 9
ramirm8 1 3 2 9
-----------------------------------------------------------------------------------------
h_vmac 4 128 185 5920
zhanm32 4 128 185 5920
-----------------------------------------------------------------------------------------
h_vuiis 3 10 0 0
viswam1 1 8 0 0
vuiis_archive 2 2 0 0
-----------------------------------------------------------------------------------------
isde-rer 1 8 0 0
champaca 1 8 0 0
-----------------------------------------------------------------------------------------
kaczkurkin_lab 1 10 0 0
abbasia 1 10 0 0
-----------------------------------------------------------------------------------------
l2_jan_lab 1 7 0 0
olivij1 1 7 0 0
-----------------------------------------------------------------------------------------
l3_norms_lab 1 8 0 0
sowellse 1 8 0 0
-----------------------------------------------------------------------------------------
l3_runnoe_group 1 16 0 0
kaldorme 1 16 0 0
-----------------------------------------------------------------------------------------
l3_talbot_group 1 1 0 0
parkst2 1 1 0 0
-----------------------------------------------------------------------------------------
l3_vuiis_cci 76 167 0 0
vuiis_daily_s 76 167 0 0
-----------------------------------------------------------------------------------------
l3_watson_lab 10 64 0 0
licerav 10 64 0 0
-----------------------------------------------------------------------------------------
lpo_group 1 1 0 0
flanac1 1 1 0 0
-----------------------------------------------------------------------------------------
maha 0 0 1 1
wardbm1 0 0 1 1
-----------------------------------------------------------------------------------------
mahmoud_group 1 128 0 0
amaraii 1 128 0 0
-----------------------------------------------------------------------------------------
maiziezhou_lab 1 12 0 0
maow 1 12 0 0
-----------------------------------------------------------------------------------------
mchaourab 0 0 261 261
kaot1 0 0 261 261
-----------------------------------------------------------------------------------------
mcml 0 0 3 256
odenyogg 0 0 2 192
subravvr 0 0 1 64
-----------------------------------------------------------------------------------------
moro_lab 5 40 0 0
lehn 5 40 0 0
-----------------------------------------------------------------------------------------
nasa_imqcam 23 736 184 5888
fangc7 23 736 184 5888
-----------------------------------------------------------------------------------------
nbody 388 658 67 88
ligo 388 658 67 88
-----------------------------------------------------------------------------------------
ng_lab 0 0 1 8
kimj119 0 0 1 8
-----------------------------------------------------------------------------------------
patel_lab 1 8 0 0
grublk 1 8 0 0
-----------------------------------------------------------------------------------------
p_csb_meiler 326 1317 8581 110282
agarwm5 207 207 271 271
resv146 6 7 0 0
tydingcw 110 1100 7833 109534
yange8 3 3 477 477
-----------------------------------------------------------------------------------------
p_dsi 0 0 2 4
yangi1 0 0 2 4
-----------------------------------------------------------------------------------------
p_meiler 0 0 1 3
yange8 0 0 1 3
-----------------------------------------------------------------------------------------
rer 3 32 0 0
cantrekb 1 4 0 0
paciarja 1 16 0 0
wonge7 1 12 0 0
-----------------------------------------------------------------------------------------
r_isde 2 8 0 0
trippej1 2 8 0 0
-----------------------------------------------------------------------------------------
rke_group 5 20 0 0
sleethmr 5 20 0 0
-----------------------------------------------------------------------------------------
rokaslab 1 100 0 0
danist 1 100 0 0
-----------------------------------------------------------------------------------------
rubinov_lab 1 10 1 4
abbasia 1 10 0 0
rubinom 0 0 1 4
-----------------------------------------------------------------------------------------
sbcs 0 0 2 2
liq17 0 0 1 1
lyul1 0 0 1 1
-----------------------------------------------------------------------------------------
stein_lab 1 1 0 0
kilics1 1 1 0 0
-----------------------------------------------------------------------------------------
taylor_group 1 3 0 0
petrop3 1 3 0 0
-----------------------------------------------------------------------------------------
tk_lab 8 320 0 0
yoonh15 8 320 0 0
-----------------------------------------------------------------------------------------
vgi 0 0 1 4
salerl1 0 0 1 4
-----------------------------------------------------------------------------------------
walker_lab 2 4 0 0
boutwedr 1 2 0 0
walkeas2 1 2 0 0
-----------------------------------------------------------------------------------------
wankowicz_lab 841 841 37460 37460
wankows 841 841 37460 37460
-----------------------------------------------------------------------------------------
wan_lab 1 400 0 0
hardenn 1 400 0 0
-----------------------------------------------------------------------------------------
womelsdorf_lab 41 450 130 1304
gerritcg 41 450 130 1304
-----------------------------------------------------------------------------------------
yang_lab_csb 1 12 0 0
zhengm9 1 12 0 0
-----------------------------------------------------------------------------------------
zhu_group 1 32 2 64
zhuw12 1 32 2 64
-----------------------------------------------------------------------------------------
Totals: 2793 10461 47673 164612
Queue Summary (Batch GPU)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
accre_guests_acc 3 3 1 1
liy110 2 2 1 1
whitejt2 1 1 0 0
-----------------------------------------------------------------------------------------
beam_lab_acc 14 14 35 35
marshazm 14 14 35 35
-----------------------------------------------------------------------------------------
csb_gpu_acc 13 15 6 6
dongj11 3 5 0 0
jurichc 0 0 6 6
kaermel 1 1 0 0
liuy145 8 8 0 0
tydingcw 1 1 0 0
-----------------------------------------------------------------------------------------
h_vmac_acc 1 1 0 0
janveva 1 1 0 0
-----------------------------------------------------------------------------------------
maiziezhou_lab_acc 8 8 0 0
chenp12 8 8 0 0
-----------------------------------------------------------------------------------------
mchaourab_acc 4 4 259 259
kaot1 3 3 258 258
shiras1 1 1 0 0
tangq3 0 0 1 1
-----------------------------------------------------------------------------------------
nbody_acc 3 17 0 0
bustam1 1 1 0 0
khanfm 2 16 0 0
-----------------------------------------------------------------------------------------
p_dsi_acc 1 1 8 8
chenx32 1 1 0 0
yangi1 0 0 8 8
-----------------------------------------------------------------------------------------
p_meiler_acc 0 0 1 1
scotj14 0 0 1 1
-----------------------------------------------------------------------------------------
stassun_acc 1 2 0 0
medani 1 2 0 0
-----------------------------------------------------------------------------------------
taylor_group_acc 1 1 0 0
criswea 1 1 0 0
-----------------------------------------------------------------------------------------
Totals: 49 66 310 310
Queue Summary (interactive)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
candelaria_group_int 1 4 0 0
shimozkk 1 4 0 0
-----------------------------------------------------------------------------------------
edwards_lab_int 1 4 0 0
seaglehm 1 4 0 0
-----------------------------------------------------------------------------------------
fe_accre_lab_int 1 4 0 0
goffta1 1 4 0 0
-----------------------------------------------------------------------------------------
h_vmac_int 1 8 0 0
jackb13 1 8 0 0
-----------------------------------------------------------------------------------------
l3_precision_nutrition_lab_int 1 128 0 0
baghem1 1 128 0 0
-----------------------------------------------------------------------------------------
maiziezhou_lab_int 2 12 0 0
maow 1 2 0 0
tangk10 1 10 0 0
-----------------------------------------------------------------------------------------
rubinov_lab_int 1 16 0 0
mohamb2 1 16 0 0
-----------------------------------------------------------------------------------------
vgi_int 1 16 0 0
shellejp 1 16 0 0
-----------------------------------------------------------------------------------------
yang_lab_int 1 8 0 0
shaoq1 1 8 0 0
-----------------------------------------------------------------------------------------
Totals: 10 200 0 0
Queue Summary (interactive_gpu)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
dsi_dgx_iacc 8 15 1 4
chattec 1 1 0 0
criswea 1 1 0 0
kogs 1 8 0 0
mohamb2 0 0 1 4
petulaa 1 1 0 0
schultls 1 1 0 0
useltom 1 1 0 0
wut18 2 2 0 0
-----------------------------------------------------------------------------------------
Totals: 8 15 1 4
Partition Summary
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
interactive up 14-00:00:0 5 mix cn[1287,1301,1328,1707,1814]
interactive up 14-00:00:0 2 alloc cn[1302,1329]
interactive up 14-00:00:0 23 idle cn[1322-1326,1330,1800-1813,1815-1817]
batch* up 14-00:00:0 2 comp cn[1234,1579]
batch* up 14-00:00:0 2 mix- cn[1315,1339]
batch* up 14-00:00:0 2 down* cn[1278,1440]
batch* up 14-00:00:0 16 drng cn[1269,1281,1317,1393,1449,1534,1576,1594,1606-1607,1609,1627,1629,1702,1705-1706]
batch* up 14-00:00:0 6 drain cn[1439,1441,1548,1551,1553,1561]
batch* up 14-00:00:0 39 mix cn[1271,1291,1304,1333,1352-1353,1374,1394,1406,1467,1480-1481,1540,1546,1552,1558-1559,1564,1578,1580,1588,1596,1602,1604-1605,1608,1610,1614,1617-1618,1621,1623-1626,1630-1631,1700,1708]
batch* up 14-00:00:0 317 alloc cn[1202-1213,1215-1233,1235-1242,1257-1262,1264-1268,1270,1272-1275,1277,1279-1280,1282-1286,1288-1290,1292-1299,1303,1305-1314,1316,1318,1320-1321,1327,1331-1332,1334-1338,1340-1351,1354,1357-1369,1371-1373,1375-1385,1387-1392,1395-1405,1407-1411,1414-1427,1430-1432,1434-1437,1442-1443,1445-1448,1450,1452-1458,1460-1464,1466,1468-1479,1482-1520,1522-1533,1535-1538,1543-1545,1547,1549-1550,1554-1557,1562-1563,1565-1571,1573-1575,1577,1581-1587,1589,1592-1593,1595,1597,1603,1612-1613,1615-1616,1619-1620,1622,1628,1632-1633,1701,1703-1704,2000]
batch* up 14-00:00:0 1 down cn1438
batch_gpu up 14-00:00:0 5 inval gpu[0045,0050,0200-0202]
batch_gpu up 14-00:00:0 2 mix- gpu[0084,0208]
batch_gpu up 14-00:00:0 2 drain* gpu[0027,0033]
batch_gpu up 14-00:00:0 14 mix gpu[0039,0059,0062-0069,0082,0300-0301],gracehopper01
batch_gpu up 14-00:00:0 5 alloc gpu[0013,0015,0017,0034,0203]
batch_gpu up 14-00:00:0 32 idle gpu[0018-0022,0026,0046,0053,0070-0081,0085,0302-0310],gracehopper02,hgx02
interactive_gpu up 14-00:00:0 1 mix- dgx01
interactive_gpu up 14-00:00:0 1 drain* dgx02
interactive_gpu up 14-00:00:0 2 mix dgx[03-04]
interactive_gpu up 14-00:00:0 4 idle gpu[0058,0060-0061,0207]
sam up 2-02:00:00 2 idle cms-sam-[01-02]
reserved inact infinite 0 n/a