>> Once I know a drive has had a head failure, do I trust that the rest of the drive isn't going to go at an inconvenient moment vs just fixing it right now when it's not 3AM on Christmas morning? (true story) As good as Ceph is, do I trust that Ceph is smart enough to prevent spreading corrupt data all over the cluster if I leave bad disks in place and they start doing terrible things to the data? I have a lot more disks than I have trust in disks. If a drive lost a head then I want it gone. I love the idea of using smart data but can foresee see some implementation issues. We have seen some raid configurations where polling smart will halt all raid operations momentarily. Also, some controllers require you to use their CLI tool to pool for smart vs smartmontools. It would be similarly awesome to embed something like an apdex score against each osd, especially if it factored in hierarchy to identify poor performing osds, nodes, racks, etc.. -- Kyle _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com