On Fri, 10 Feb 2012, Henry C Chang wrote: > ÿÿ 2012ÿÿ2ÿÿ9ÿÿÿÿÿÿ11:14ÿÿJosh Durgin <josh.durgin@xxxxxxxxxxxxx> ÿÿÿÿÿÿ > > Detecting missing objects on startup is possible by looking at > > the pg log and comparing it to the objects on disk, but this can > > be a pretty expensive operation. The osd might also be out of > > Yeah. It can be pretty expensive, but we only do it once on startup. > Also, since the osd has not yet joined the cluster, it shouldn't > affect the cluster > performance. > > > date, so it's log might be useless (for example it could have > > divergent history that was not acked). It can't know how many > > current objects that should be there aren't until it goes through > > peering (to get an up to date and authoritative log) and > > recovery (to get missing data the logs say should be there). This > > is why scrub skips pgs that aren't active+clean. More details of > > peering can be found at http://ceph.newdream.net/docs/latest/dev/peering/. > > Since peering only compare logs, I was thinking at least the osd should > check the existence of the objects the log claims to have. Then, we > would have the chance to recover the object before the pg goes active. > > Also, I like the idea of storing crc/hash alongside the object as Tv said. > With that, we can even prevent the client from reading the corrupt data > by checking the crc/hash on each read. (Though, the read performance > will surely degrade.) It would be nice. There would be a fair bit of additional complexity to do it, though. We'd need crcs for smallish blocks, for example, to minimize reading adjacent data when we modify things. It's also sort of frustrating that btrfs is doing exactly this one layer down. On a semi-related note, we should be using hashes for scrub to avoid shipping a lot of metadata over the wire for comparison. sage