Letzte Woche hielt mein Server eine Hiobsbotschaft für mich bereit:
+ahcich0: Timeout on slot 23 port 0 +ahcich0: is 00000000 cs 01800000 ss 00000000 rs 01800000 tfd c0 serr 00000000 cmd 0000f717 +ahcich0: AHCI reset: device not ready after 31000ms (tfd = 00000080) +ahcich0: Timeout on slot 24 port 0 +ahcich0: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 80 serr 00000000 cmd 0000f817 +ahcich0: AHCI reset: device not ready after 31000ms (tfd = 00000080) +ahcich0: Timeout on slot 24 port 0 +ahcich0: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 80 serr 00000000 cmd 0000f817 +(ada0:ahcich0:0:0:0): lost device +ahcich0: AHCI reset: device not ready after 31000ms (tfd = 00000080) +ahcich0: Timeout on slot 24 port 0 +ahcich0: is 00000000 cs 03000000 ss 00000000 rs 03000000 tfd 80 serr 00000000 cmd 0000f817 +(ada0:ahcich0:0:0:0): removing device entry +ahcich0: AHCI reset: device not ready after 31000ms (tfd = 00000080) +ahcich0: Poll timeout on slot 25 port 0 +ahcich0: is 00000000 cs 02000000 ss 00000000 rs 02000000 tfd 80 serr 00000000 cmd 0000f917
Festplattenausfall! Die betroffene Platte war Teil eines ZFS-Raids aus vier Platten. Entsprechend sah auch die ZFS-Statusmeldung aus:
Checking status of zfs pools: NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT pool 7.25T 5.22T 2.03T 72% 1.00x DEGRADED - pool: pool state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 13091973070404888760 REMOVED 0 0 0 was /dev/ada0p1 ada1p1 ONLINE 0 0 0 ada2p1 ONLINE 0 0 0 ada3p1 ONLINE 0 0 0 errors: No known data errors