Saved by ZFS – a disk is about to die – Weblog

May 19, 2010

Saved by ZFS – a disk is about to die

I have setup an NFS share under Solaris 10. It utilizes ZFS, which in turn usesÂ an IBM DS400 for backend storage. On top of that I have nagios running to monitor it. I got an alarm about the zfs pool being in a degraded state. I logged into the system and found this in the dmesg:

May 17 03:20:20 files DESC: The number of checksum errors associated with a ZFS device
May 17 03:20:20 files exceeded acceptable levels.Â  Refer to http://sun.com/msg/ZFS-8000-GH for more information.

To see more information I ran

-bash-3.00# zpool status
 pool: rz2pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.Â  An
 attempt was made to correct the error.Â  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
 see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 5h21m with 0 errors on Wed May 19 08:41:49 2010
config:

 NAMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  STATEÂ Â Â Â  READ WRITE CKSUM
 rz2poolÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  DEGRADEDÂ Â Â Â  0Â Â Â Â  0Â Â Â Â  0
   raidz2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd0Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd1Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd2Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd3Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd4Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd5Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd6Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
   raidz2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  DEGRADEDÂ Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd7Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd8Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd9Â Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd10Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd11Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0
     c3t21000000D12643DEd12Â  DEGRADEDÂ Â Â Â  0Â Â Â Â  0Â Â  234Â  too many errors
     c3t21000000D12643DEd13Â  ONLINEÂ Â Â Â Â Â  0Â Â Â Â  0Â Â Â Â  0

errors: No known data errors

And this is where ZFS is awsome. It may not be the fastest volume manager on the planet, or the smartest. But I trust the integrity of it (having read whitepapers on it).

What is really cool here:

It has detected that the underlying LUN is misbehaving.
It has marked the LUN as degraded
It has saved my data from silent corruption.

There are not many volume managers out there, which does that. I have not lost data, the dataintegrity is still in place and I know what disk is about to fail. Kudos and thanks to the ZFS dev team!

admin on The old king is dead (NAD 3020i) ….August 5, 2014
Hi Rasmus Thanks a lot! The C356BEE is a huge upgrade over the 3020i. It has plenty og power and…
regj on The old king is dead (NAD 3020i) ….August 5, 2014
Had my eyes on the C356BEE as well. Bought a used Lyngdorf SDAI 2175 (now discontinued), which has a serial…
admin on End user review of Bluesound NodeAugust 4, 2014
Hi It does not support NFS shares using the webinterface. The Bluesound is not your average media player. It is…
Ahenry on End user review of Bluesound NodeAugust 1, 2014
Does the Bluesound support NFS shares? Did you determine which media player it was using?
Stelios Mavromichalis on Configuring and using the BMC on an IBM eServer 326.May 19, 2014
i just can’t thank you enough for this post. so i will just say: thank! you! best, /mstelios

Weblog – Thomas S. Iversen

Saved by ZFS – a disk is about to die

Leave a Reply Cancel reply