After messing up some of my data in the past (my own doing, playing with BTRFS in old kernels), I've been extra cautious and now run a ZFS mirror across multiple RBD images. It's led me to believe that I have a faulty SSD in one of my hosts: sdb without a journal - fine (but slow) sdc without a journal - fine (but slow) sdd without a journal - fine (but slow) sdb with sda4 as journal - checksum errors appear in ZFS sdc with sda5 as journal - checksum errors appear in ZFS sdd with sda6 as journal - checksum errors appear in ZFS So, I believe the SSD in sda is in some way defective, but my question is around the detection and correction of this 'corruption'. "nodeep-scrub flag(s) set" currently, due to the performance impact. But, if it were set, it seems to find problems, which I can then repair. However ... is this a safe repair, using a good copy each object? Will it be with NewStore? I still seem to get errors regularly bubbling their way up into ZFS, but I can't reliably ascertain whether they're the result of a corruption which has happened *before* the next Ceph deep scrub (therefore still exposed anyway in this timeframe?), or is *after* a repair? I'm obviously hoping for an eventual scenario where this is all transparent to the ZFS layer and it stops detecting checksum errors :) Thanks, Chris _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com