ACCRE C7 Cluster Quick and Dirty Status

Report generated at Thu Apr 3 03:26:01 AM CDT 2025

Problem Nodes

HOSTNAMES      STATE      AVAIL_FEATURES                 TIMESTAMP            USER       REASON                                              
cn390          down*      sandybridge                    2025-01-26T08:52:18  slurm      Not responding                                      
cn392          down*      sandybridge                    2025-01-27T22:16:57  slurm      Not responding                                      
cn421          down*      sandybridge                    2025-03-21T05:50:19  slurm      Not responding                                      
cn430          drained    sandybridge                    2025-03-31T14:00:11  slurm      Prolog error                                        
cn486          drained    sandybridge                    2025-03-31T14:00:11  slurm      Prolog error                                        
cn912          down*      sandybridge                    2025-02-28T14:51:14  slurm      Not responding                                      
cn1083         drained    sandybridge                    2025-03-31T14:00:11  slurm      Prolog error                                        
cn1091         drained    sandybridge                    2025-03-31T14:00:11  slurm      Prolog error                                        
cn1092         down*      sandybridge                    2025-03-06T20:58:51  slurm      Not responding                                      
cn1096         down*      sandybridge                    2025-03-06T14:23:48  slurm      Not responding                                      
cn1325         drained    haswell                        2025-03-08T08:45:02  appelte1   Nobody - RT90957 - memory issues, instability       
cn1329         drained    haswell                        2025-03-08T08:47:01  appelte1   Nobody - RT90958 - memory issues, instability       
cn1332         down*      haswell                        2025-03-17T14:57:37  slurm      Not responding                                      
cn1350         drained    haswell                        2025-03-08T08:52:43  appelte1   Nobody - RT90959 - memory issues, instability       
cn1356         down*      haswell                        2025-03-14T21:23:43  slurm      Not responding                                      
cn1368         down*      haswell                        2025-03-17T14:40:56  slurm      Not responding                                      
cn1376         drained    haswell                        2025-03-08T08:54:25  appelte1   Nobody - RT90960 - memory issues, instability       
cn1385         drained    haswell                        2025-03-08T08:56:26  appelte1   Nobody - RT90961 - memory issues, instability       
gpu0014        draining   broadwell,pascal,p3584         2025-03-31T16:38:53  slurm      Prolog error                                        
gpu0015        drained    broadwell,pascal,p3584         2025-04-01T11:21:22  slurm      Prolog error                                        
gpu0018        drained    broadwell,pascal,p3584         2025-03-31T16:41:03  slurm      Prolog error                                        
gpu0019        draining   broadwell,pascal,p3584         2025-04-01T11:24:22  slurm      Prolog error                                        
gpu0021        draining   broadwell,pascal,p3584         2025-04-01T11:27:17  slurm      Prolog error                                        
gpu0026        draining   broadwell,pascal,p3840         2025-04-01T11:29:23  slurm      Prolog error                                        
gpu0027        draining   broadwell,pascal,p3840         2025-04-01T11:31:30  slurm      Prolog error                                        
gpu0030        drained    broadwell,pascal,p3840         2025-03-06T15:43:40  root       Samuel - RT90936 - Bad gpu                          
gpu0038        drained    skylake,turing,csbtmp          2025-02-17T12:18:42  slurm      Nobody - RT90396 - GPU0 in error state : Not respond
gpu0045        drained    skylake,turing,csbtmp          2025-02-17T13:00:08  slurm      gres/gpu count reported lower than configured (3 < 4
gpu0048        drained    skylake,turing,csbtmp          2025-04-01T06:08:57  root       Kill task failed                                    
gpu0080        drained*   icelake,a6000x4,csbtmp         2024-04-25T13:03:59  root       Melo is using this machine for testing              
p-matheny-lab- down*      zen                            2025-03-01T15:59:51  slurm      Not responding                                      

Queue Summary (Production)

GROUP        USER                  ACTIVE_JOBS  ACTIVE_CORES  PENDING_JOBS  PENDING_CORES
-----------------------------------------------------------------------------------------
accre                                  0            0             1             2
            appelte1                   0            0             1             2
-----------------------------------------------------------------------------------------
accre_guests                           0            0             2            36
            senthia                    0            0             2            36
-----------------------------------------------------------------------------------------
anderson_mri                          47          188           153           153
            xul13                     47          188           153           153
-----------------------------------------------------------------------------------------
behringer_lab                          0            0             1             8
            haleof                     0            0             1             8
-----------------------------------------------------------------------------------------
booth_lab                              1            1             0             0
            comptoab                   1            1             0             0
-----------------------------------------------------------------------------------------
brg_cores                             13           76             0             0
            desilvt                   12           60             0             0
            kandelr                    1           16             0             0
-----------------------------------------------------------------------------------------
caldwell_lab                           0            0             1            16
            humphrjm                   0            0             1            16
-----------------------------------------------------------------------------------------
calipari_lab                           0            0             1            18
            barthb1                    0            0             1            18
-----------------------------------------------------------------------------------------
candelaria_group                       0            0             2            40
            hatche                     0            0             2            40
-----------------------------------------------------------------------------------------
capra_lab_csb                          2            2             0             0
            mothcw                     2            2             0             0
-----------------------------------------------------------------------------------------
cms                                    0            0             5             5
            meloam                     0            0             5             5
-----------------------------------------------------------------------------------------
cmsadmin                               0            0             1             1
            autocms                    0            0             1             1
-----------------------------------------------------------------------------------------
cms_lowprio                           27           51           182           455
            cmslocal                  11           29             8            32
            cmspilot                  16           22           174           423
-----------------------------------------------------------------------------------------
coxlab                                22           67             0             0
            evansp1                   22           67             0             0
-----------------------------------------------------------------------------------------
davis_lab                              1            1             0             0
            tsail2                     1            1             0             0
-----------------------------------------------------------------------------------------
edwards_lab                            1            1             0             0
            parkerac                   1            1             0             0
-----------------------------------------------------------------------------------------
g_gamazon_lab                          1            4             1            12
            kimn13                     0            0             1            12
            salerl1                    1            4             0             0
-----------------------------------------------------------------------------------------
g_giri_group                           5           15             0             0
            basnettb                   5           15             0             0
-----------------------------------------------------------------------------------------
h_fabbrilab                            1           10             0             0
            yec2                       1           10             0             0
-----------------------------------------------------------------------------------------
h_vangard_1                            1           16             0             0
            yef1                       1           16             0             0
-----------------------------------------------------------------------------------------
h_vmac                                 0            0           990           990
            suny36                     0            0           990           990
-----------------------------------------------------------------------------------------
isde-rer                               2            6             0             0
            vielmej                    2            6             0             0
-----------------------------------------------------------------------------------------
jswhep                                 1            8             0             0
            atehort                    1            8             0             0
-----------------------------------------------------------------------------------------
l3_manzanas_group                      0            0             1            13
            manzand                    0            0             1            13
-----------------------------------------------------------------------------------------
l3_precision_nutriti                     2           13             0             0
            baghem1                    2           13             0             0
-----------------------------------------------------------------------------------------
l3_wilkey_lab                          1           16             7           112
            starlii                    1           16             7           112
-----------------------------------------------------------------------------------------
maiziezhou_lab                         1           34             5           194
            chowx                      1           34             2            14
            xiem6                      0            0             3           180
-----------------------------------------------------------------------------------------
nbody                                  1           32             0             0
            smitm77                    1           32             0             0
-----------------------------------------------------------------------------------------
palmeri_lab                          200          210             0             0
            bahgg                    200          210             0             0
-----------------------------------------------------------------------------------------
p_collins_lab                          0            0             1             8
            chencl1                    0            0             1             8
-----------------------------------------------------------------------------------------
p_dsi                                  1            1             0             0
            yangi1                     1            1             0             0
-----------------------------------------------------------------------------------------
p_matheny_lab                          1            2             0             0
            koolajd1                   1            2             0             0
-----------------------------------------------------------------------------------------
p_neuert_lab                           1            8             0             0
            hughesjj                   1            8             0             0
-----------------------------------------------------------------------------------------
rer                                    2           32             0             0
            hum6                       2           32             0             0
-----------------------------------------------------------------------------------------
r_isde                                 1           16             0             0
            trippej1                   1           16             0             0
-----------------------------------------------------------------------------------------
rokaslab                              11           51             0             0
            borrag                     1            1             0             0
            lint8                     10           50             0             0
-----------------------------------------------------------------------------------------
ruderferlab                            1           10             0             0
            yec2                       1           10             0             0
-----------------------------------------------------------------------------------------
sbcs                                   3           31            41            41
            jiag                       0            0            41            41
            nguyensm                   1           10             0             0
            pingj2                     1           16             0             0
            xus13                      1            5             0             0
-----------------------------------------------------------------------------------------
taylor_group                           2           12             0             0
            milesmt                    1            8             0             0
            schultls                   1            4             0             0
-----------------------------------------------------------------------------------------
tk_lab                                 0            0             1            80
            yoonh14                    0            0             1            80
-----------------------------------------------------------------------------------------
tong_lab                               1           16             0             0
            lutherzr                   1           16             0             0
-----------------------------------------------------------------------------------------
vgi                                    1           10             0             0
            yec2                       1           10             0             0
-----------------------------------------------------------------------------------------
walker_lab                            10           22             0             0
            deanrt                     9           18             0             0
            guox11                     1            4             0             0
-----------------------------------------------------------------------------------------
wankowicz_lab                          1            1             0             0
            wankows                    1            1             0             0
-----------------------------------------------------------------------------------------
yang_lab_csb                         213          534             0             0
            shaoq1                   110          110             0             0
            shinw3                   102          408             0             0
            zhangsw                    1           16             0             0
-----------------------------------------------------------------------------------------
Totals:                              579         1497          1396          2184

Queue Summary (Pascal)

GROUP        USER                  ACTIVE_JOBS  ACTIVE_GPUS   PENDING_JOBS   PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals:                                0            0             0             0

Queue Summary (Turing)

GROUP        USER                  ACTIVE_JOBS  ACTIVE_GPUS   PENDING_JOBS   PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals:                                0            0             0             0

Queue Summary (A6000x4)

GROUP        USER                  ACTIVE_JOBS  ACTIVE_GPUS   PENDING_JOBS   PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals:                                0            0             0             0

Queue Summary (A6000x2)

GROUP        USER                  ACTIVE_JOBS  ACTIVE_GPUS   PENDING_JOBS   PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals:                                0            0             0             0

Partition Summary

PARTITION                 AVAIL  TIMELIMIT  NODES  STATE NODELIST
production*                  up 14-00:00:0      3  down* cn[1332,1356,1368]
production*                  up 14-00:00:0      5  drain cn[1325,1329,1350,1376,1385]
production*                  up 14-00:00:0     29    mix cn[1300,1315,1320,1322,1326,1328,1333-1335,1337-1338,1340,1351,1355,1358-1360,1363,1365,1374,1379-1382,1389,1392,1394,1701,1704]
production*                  up 14-00:00:0     49  alloc cn[1301-1302,1306-1307,1311,1317-1318,1321,1323-1324,1330-1331,1336,1339,1341-1349,1352-1354,1357,1361-1362,1366-1367,1369-1373,1375,1378,1383-1384,1387-1388,1390-1391,1393,1395-1398]
nogpfs                       up 14-00:00:0      6  down* cn[390,392,421,912,1092,1096]
nogpfs                       up 14-00:00:0      4  drain cn[430,486,1083,1091]
nogpfs                       up 14-00:00:0    139  alloc cn[303-308,311,313,315,317-318,320,322-324,326-333,335-338,340,347-348,351,353,355-360,362-365,367,369-370,372-380,384-385,388-389,394,398-401,403-405,407,411,413-420,423,425,427,429,431-432,435-437,439-443,445-446,448-452,455,460,463-466,468-472,474-477,479,481-485,491,495-496,499,1081-1082,1085-1087,1089-1090,1094-1095,1122-1123,1125-1126,1128-1129,1132,1134]
debug                        up   infinite      2    mix gpu[0022,0059]
debug                        up   infinite      4   idle cn[371,1101],gpu[0006,0046]
sam                          up 2-02:00:00      2   idle vm-cms-sam-pri,vm-cms-sam-sec
maxwell                      up 5-00:00:00      1    mix gpu0004
maxwell                      up 5-00:00:00      2   idle gpu[0002,0012]
pascal                       up 5-00:00:00      5   drng gpu[0014,0019,0021,0026-0027]
pascal                       up 5-00:00:00      3  drain gpu[0015,0018,0030]
pascal                       up 5-00:00:00      9    mix gpu[0013,0017,0020,0022-0023,0025,0031,0033-0034]
turing                       up 5-00:00:00      3  drain gpu[0038,0045,0048]
turing                       up 5-00:00:00      2  alloc gpu[0035,0049]
a6000x2                      up 5-00:00:00      1    mix gpu0003
a6000x2                      up 5-00:00:00      5   idle gpu[0005-0008,0010]
a4000x4                      up 14-00:00:0      0    n/a 
a6000x4                      up 14-00:00:0      1 drain* gpu0080
a6000x4                      up 14-00:00:0      6    mix gpu[0059,0077-0079,0081-0082]
a100                         up 14-00:00:0      0    n/a 
a100x8                       up 14-00:00:0      0    n/a 
a4000x8                      up 14-00:00:0      0    n/a 
cgw-vm-qa-flatearth1         up   infinite      1   idle vm-qa-flatearth1
cgw-djroomba                 up   infinite      1   idle djroomba
cgw-cqs1                     up   infinite      1   idle cqs1
cgw-cqs3                     up   infinite      1   idle cqs3
cgw-rocksteady               up   infinite      1   idle rocksteady
cgw-tbi01                    up   infinite      1   idle tbi01
cgw-capra1                   up   infinite      1   idle capra1
cgw-horus                    up   infinite      1   idle horus
cgw-dsi-gw                   up   infinite      1   idle dsi-gw
cgw-maizie                   up   infinite      1   idle maizie
cgw-maizie2                  up   infinite      1   idle maizie2
cgw-maizie3                  up   infinite      1   idle maizie3
cgw-hgen01                   up   infinite      1   idle hgen01
cgw-p-matheny-lab-server1    up   infinite      1  down* p-matheny-lab-server1
cgw-sideshowbob              up   infinite      1   idle sideshowbob
cgw-platypus                 up   infinite      1    mix platypus
cgw-hanuman                  up   infinite      1   idle hanuman
cgw-lego                     up   infinite      1   idle lego
cgw-badger                   up   infinite      1   idle badger
cgw-candelaria01             up   infinite      1    mix candelaria01
cgw-holowatyj01              up   infinite      1   idle holowatyj01
cgw-cartailler01             up   infinite      1   idle cartailler01
cgw-gamazon01                up   infinite      1   idle gamazon01

Queue Summary (All Partitions)

GROUP        USER                  ACTIVE_JOBS  ACTIVE_CORES  PENDING_JOBS  PENDING_CORES
-----------------------------------------------------------------------------------------
accre                                  0            0             1             2
            appelte1                   0            0             1             2
-----------------------------------------------------------------------------------------
accre_guests                           0            0             2            36
            senthia                    0            0             2            36
-----------------------------------------------------------------------------------------
anderson_mri                          47          188           153           153
            xul13                     47          188           153           153
-----------------------------------------------------------------------------------------
behringer_lab                          0            0             1             8
            haleof                     0            0             1             8
-----------------------------------------------------------------------------------------
bme3890                                0            0             3            17
            909065                     0            0             1            15
            dhattk                     0            0             2             2
-----------------------------------------------------------------------------------------
booth_lab                              1            1             0             0
            comptoab                   1            1             0             0
-----------------------------------------------------------------------------------------
brg_cores                             13           76             0             0
            desilvt                   12           60             0             0
            kandelr                    1           16             0             0
-----------------------------------------------------------------------------------------
caldwell_lab                           0            0             1            16
            humphrjm                   0            0             1            16
-----------------------------------------------------------------------------------------
calipari_lab                           0            0             1            18
            barthb1                    0            0             1            18
-----------------------------------------------------------------------------------------
candelaria_group                       0            0             2            40
            hatche                     0            0             2            40
-----------------------------------------------------------------------------------------
capra_lab_csb                          2            2             0             0
            mothcw                     2            2             0             0
-----------------------------------------------------------------------------------------
cgw_candelaria01                       3           28             0             0
            hatche                     1           20             0             0
            mcgilldg                   1            4             0             0
            quintedc                   1            4             0             0
-----------------------------------------------------------------------------------------
cgw_djroomba                          12           12             0             0
            mcphauna                  12           12             0             0
-----------------------------------------------------------------------------------------
cgw_maizie                             3           72             0             0
            wangh67                    3           72             0             0
-----------------------------------------------------------------------------------------
cgw_platypus                           7          148             0             0
            mohamb2                    1           32             0             0
            rubinom                    1           16             0             0
            sardarn                    5          100             0             0
-----------------------------------------------------------------------------------------
cms                                  139         1668            40            40
            cmslocal                  35          420            16            16
            cmspilot                 104         1248            19            19
            meloam                     0            0             5             5
-----------------------------------------------------------------------------------------
cmsadmin                               0            0             1             1
            autocms                    0            0             1             1
-----------------------------------------------------------------------------------------
cms_lowprio                           27           51           182           455
            cmslocal                  11           29             8            32
            cmspilot                  16           22           174           423
-----------------------------------------------------------------------------------------
coxlab                                22           67             0             0
            evansp1                   22           67             0             0
-----------------------------------------------------------------------------------------
cs3892-oguz_acc                        0            0             1             1
            914505                     0            0             1             1
-----------------------------------------------------------------------------------------
csb_gpu_acc                           63          142           679          3141
            bisigp1                    3           12             0             0
            cunnik8                   23           23            84            84
            ger1                       0            0            20            20
            guox11                     8           48           491          2946
            howardvr                   0            0             1             8
            marinot                    1           16             0             0
            ranx                       1            9             0             0
            shaoq1                    27           34            83            83
-----------------------------------------------------------------------------------------
davis_lab                              1            1             0             0
            tsail2                     1            1             0             0
-----------------------------------------------------------------------------------------
edwards_lab                            1            1             0             0
            parkerac                   1            1             0             0
-----------------------------------------------------------------------------------------
g_gamazon_lab                          1            4             1            12
            kimn13                     0            0             1            12
            salerl1                    1            4             0             0
-----------------------------------------------------------------------------------------
g_giri_group                           5           15             0             0
            basnettb                   5           15             0             0
-----------------------------------------------------------------------------------------
h_fabbrilab                            1           10             0             0
            yec2                       1           10             0             0
-----------------------------------------------------------------------------------------
h_vangard_1                            1           16             0             0
            yef1                       1           16             0             0
-----------------------------------------------------------------------------------------
h_vmac                                 0            0           990           990
            suny36                     0            0           990           990
-----------------------------------------------------------------------------------------
isde-rer                               2            6             0             0
            vielmej                    2            6             0             0
-----------------------------------------------------------------------------------------
jswhep                                 1            8             0             0
            atehort                    1            8             0             0
-----------------------------------------------------------------------------------------
l3_manzanas_group                      0            0             1            13
            manzand                    0            0             1            13
-----------------------------------------------------------------------------------------
l3_precision_nutriti                     2           13             0             0
            baghem1                    2           13             0             0
-----------------------------------------------------------------------------------------
l3_wilkey_lab                          1           16             7           112
            starlii                    1           16             7           112
-----------------------------------------------------------------------------------------
maiziezhou_lab                         1           34             5           194
            chowx                      1           34             2            14
            xiem6                      0            0             3           180
-----------------------------------------------------------------------------------------
nbody                                  1           32             0             0
            smitm77                    1           32             0             0
-----------------------------------------------------------------------------------------
nbody_acc                              0            0             1            16
            khanfm                     0            0             1            16
-----------------------------------------------------------------------------------------
neurogroup_acc                         1            6             0             0
            songrw                     1            6             0             0
-----------------------------------------------------------------------------------------
palmeri_lab                          200          210             0             0
            bahgg                    200          210             0             0
-----------------------------------------------------------------------------------------
p_collins_lab                          0            0             1             8
            chencl1                    0            0             1             8
-----------------------------------------------------------------------------------------
p_dsi                                  1            1             1             4
            malikm2                    0            0             1             4
            yangi1                     1            1             0             0
-----------------------------------------------------------------------------------------
p_dsi_acc                              0            0             2            12
            srikas2                    0            0             2            12
-----------------------------------------------------------------------------------------
p_matheny_lab                          1            2             0             0
            koolajd1                   1            2             0             0
-----------------------------------------------------------------------------------------
p_neuert_lab                           1            8             0             0
            hughesjj                   1            8             0             0
-----------------------------------------------------------------------------------------
rer                                    2           32             0             0
            hum6                       2           32             0             0
-----------------------------------------------------------------------------------------
r_isde                                 1           16             0             0
            trippej1                   1           16             0             0
-----------------------------------------------------------------------------------------
rokaslab                              11           51             0             0
            borrag                     1            1             0             0
            lint8                     10           50             0             0
-----------------------------------------------------------------------------------------
rubinov_lab_acc                        0            0             1             1
            mohamb2                    0            0             1             1
-----------------------------------------------------------------------------------------
ruderferlab                            1           10             0             0
            yec2                       1           10             0             0
-----------------------------------------------------------------------------------------
sbcs                                   3           31            41            41
            jiag                       0            0            41            41
            nguyensm                   1           10             0             0
            pingj2                     1           16             0             0
            xus13                      1            5             0             0
-----------------------------------------------------------------------------------------
taylor_group                           2           12             0             0
            milesmt                    1            8             0             0
            schultls                   1            4             0             0
-----------------------------------------------------------------------------------------
tk_lab                                 0            0             1            80
            yoonh14                    0            0             1            80
-----------------------------------------------------------------------------------------
tong_lab                               1           16             0             0
            lutherzr                   1           16             0             0
-----------------------------------------------------------------------------------------
vgi                                    1           10             0             0
            yec2                       1           10             0             0
-----------------------------------------------------------------------------------------
walker_lab                            10           22             0             0
            deanrt                     9           18             0             0
            guox11                     1            4             0             0
-----------------------------------------------------------------------------------------
wankowicz_lab                          1            1             0             0
            wankows                    1            1             0             0
-----------------------------------------------------------------------------------------
yang_lab_csb                         213          534             0             0
            shaoq1                   110          110             0             0
            shinw3                   102          408             0             0
            zhangsw                    1           16             0             0
-----------------------------------------------------------------------------------------
Totals:                              807         3573          2119          5411