nas-12-2 recovery was successful. We were able to scrape enough data from the failing drives that ZFS was able to rebuild onto new drives.
Posted Apr 23, 2025 - 17:19 PDT
Monitoring
nas-12-2's resilver finished ahead of schedule, so we have re-enabled it for use in Farm. One disk kicked off another resilver, but it is the only disk in that vdev with any issues, so we feel pretty comfortable allowing that to happen in the background.
If you run into issues with nas-12-2, please open a Farm Support Ticket.
Posted Apr 11, 2025 - 16:10 PDT
Update
Disk replacements were successfully performed yesterday, and data reconstruction onto them is in progress. ZFS is currently estimating it will finish in a little over three days, so the best-case estimate is nas-12-2 will be available for use late Saturday evening. We will provide more updates as the reconstruction progresses.
Posted Apr 09, 2025 - 10:28 PDT
Update
The ZFS pool scrub (data verification) is in progress. As you can imagine, 409 TB of data takes a while to verify. The current ETA is that it will finish sometime late tonight. This scrub has caused 3 additional hard drives to drop out. The executive decision has been made to replace those drives before allowing users to access the pool. The estimate is an additional 3 days for those drives to have all the data reconstructed onto them, so our best-guess ETA for return-to-service is late this week. We will post additional updates as the disk replacement proceeds.
Posted Apr 07, 2025 - 14:49 PDT
Identified
The ZFS scrub has been started and is being watched carefully.
Posted Apr 04, 2025 - 17:47 PDT
Monitoring
As tends to happen with failing hard drives, data recovery often goes slower than hoped. Two drives had 100% of the data recovered, and a third had 99.99% recovered. The last drive failed too hard to recover data from, but that is okay, ZFS should be able to reconstruct everything it needs from the first three. A ZFS scrub (data verification) is in progress. When this finishes, likely early next week, we will know for sure the state of all the data on nas-12-2.
Posted Apr 04, 2025 - 14:31 PDT
Update
Summary: nas-12-2 could be online Friday at the earliest, but more likely early next week.
In consultation with Adam Getchell, the decision has been made to do low-level disk copy from the old, failing drives, to new drives. This will minimize the potential for data loss.
This process is expected to finish Thursday at the earliest. Subsequently, the new disks will be added back to the ZFS pool, and we will trigger a full ZFS data scrub. When that finishes, we will know exactly how much, if any, data loss there is and which files are impacted. That data scrub will take a minimum of 24 hours, so the earliest nas-12-2 could be back in service is late Friday. It is more likely the scrub will run through the weekend, so a more realistic return-to-service is early next week.
Posted Apr 02, 2025 - 13:59 PDT
Identified
nas-12-2 has suffered from multiple disk failures. Admins are investigating the best path forward.
The following group directories are currently unavailable: