On Thu, 6 Nov 2014, GuangYang wrote: > Hello Cephers, > Recently we observed a couple of inconsistencies in our Ceph cluster, > there were two major patterns leading to inconsistency as I observed: 1) > EIO to read the file, 2) the digest is inconsistent (for EC) even there > is no read error). > > While ceph has built-in tool sets to repair the inconsistencies, I also > would like to check with the community in terms of what is the best ways > to handle such issues (e.g. should we run fsck / xfs_repair when such > issue happens). > > In more details, I have the following questions: > 1. When there is inconsistency detected, what is the chance there is > some hardware issues which need to be repaired physically, or should I > run some disk/filesystem tools to further check? I'm not really an operator so I'm not as familiar with these tools as I should be :(, but I suspect the prodent route is to check the SMART info on the disk, and/or trigger a scrub of everything else on the OSD (ceph osd scrub N). For DreamObjects, I think they usually just fail the OSD once it starts throwing bad sectors (most of the hardware is already reasonably aged). > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should > we solely relay on Ceph's repair tool sets? That might not be a bad idea, but I would urge caution if xfs_repair finds any issues or makes any changes, as subtle changes to the fs contents can confuse ceph-osd. At an absolute minimum, do a full scrub after, but even better would be to fail the OSD. (FWIW I think we should document a recommended "safe" process for failing/replacing an OSD that takes the suspect data offline but waits for the cluster to heal before destroying any data. Simply marking the OSD out will work, but then when a fresh drive is added there will be a second repair/rebalance event, which isn't ideal.) sage > > It would be great to hear you experience and suggestions. > > BTW, we are using XFS in the cluster. > > Thanks, > Guang N????y????b?????v?????{.n??????z??ay????????j???f????????????????:+v??????????zZ+??????"?!? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html