Monitoring - Quobyte storage is available again. We are continuing to monitor.
Jul 25, 2025 - 10:31 PDT
Identified - Quobyte storage is unavailable. We have identified the problem and are working to return service.
Jul 25, 2025 - 10:18 PDT
Investigating - Farm's slurmdbd is having intermittent issues. If you see an error like below, it means the problem has occurred again, and we will restart slurmdbd to bring it back into service.

"""sacctmgr: error: _open_persist_conn: failed to open persistent connection to host:monitoring-ib:6819: Connection timed out
sacctmgr: error: Sending PersistInit msg: Connection timed out"""

We have a support case open with SchedMD and will update this issue as we learn more.

Apr 23, 2025 - 17:22 PDT
Login ? Operational
90 days ago
99.98 % uptime
Today
Storage ? Operational
90 days ago
100.0 % uptime
Today
File transfer node ? Operational
90 days ago
100.0 % uptime
Today
high2,med2,low2 ? Operational
90 days ago
100.0 % uptime
Today
high,med,low ? Operational
90 days ago
100.0 % uptime
Today
bmh,bmm ? Operational
90 days ago
100.0 % uptime
Today
bigmemh,bigmemm ? Operational
90 days ago
100.0 % uptime
Today
bgpu ? Operational
90 days ago
100.0 % uptime
Today
gpuh,gpum ? Operational
90 days ago
100.0 % uptime
Today
Email ? Operational
90 days ago
100.0 % uptime
Today
Virtualization Operational
90 days ago
100.0 % uptime
Today
Proxmox Virtualization Nodes Operational
90 days ago
100.0 % uptime
Today
Ganetti cluster ? Operational
90 days ago
100.0 % uptime
Today
Slurm ? Operational
90 days ago
83.54 % uptime
Today
Software Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Jul 26, 2025

No incidents reported today.

Jul 25, 2025

Unresolved incident: Quobyte Unavaiable.

Jul 24, 2025

No incidents reported.

Jul 23, 2025
Resolved - This issue has been resolved; system administrators have fixed replication issues with the account provisioning process.
Jul 23, 14:25 PDT
Monitoring - System administrators have fixed an issue with the account synchronization process and will continue to monitor the database for inconsistencies. Users should now be able to log in without issue.
Jul 23, 09:10 PDT
Investigating - Some users report not being able to log into Farm's head node, but they can use Open OnDemand. System administrators are currently looking into the issue.
Jul 23, 08:08 PDT
Jul 22, 2025

No incidents reported.

Jul 21, 2025

No incidents reported.

Jul 20, 2025

No incidents reported.

Jul 19, 2025

No incidents reported.

Jul 18, 2025

No incidents reported.

Jul 17, 2025

No incidents reported.

Jul 16, 2025

No incidents reported.

Jul 15, 2025

No incidents reported.

Jul 14, 2025

No incidents reported.

Jul 13, 2025

No incidents reported.

Jul 12, 2025

No incidents reported.