Thanks Sage. I opened an issue in tracker to follow up on this potential enhancement - http://tracker.ceph.com/issues/9943. Thanks, Guang ---------------------------------------- > Date: Wed, 29 Oct 2014 08:11:01 -0700 > From: sage@xxxxxxxxxxxx > To: yguang11@xxxxxxxxxxx > CC: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: OSD crashed due to filestore EIO > > On Wed, 29 Oct 2014, GuangYang wrote: >> Recently we observed an OSD crash due to file corruption in filesystem, >> which leads to an assertion failure at FileStore::read as EIO is not >> tolerated. As file corruption is normal in large deployment, I am >> thinking if that behavior is too aggressive, especially for EC pool. >> >> After searching, I found this flag might help : filestore_fail_eio, >> which can make the OSD survive an EIO failure, it is true by default >> though. I haven't tested it yet. > > That will reove the immediate assert. Currently, for an object being read > by a client, it will just pass EIO back to the client, though, which is > clearly not what we want. > >> Does it make sense to adjust the behavior a little bit, if the filestore >> read fail due to file corruption, return back the failure and at the >> same time mark the PG as inconsistent, due the redundancy (replication >> or EC), the request can still be served, and at the same time, we can >> get alert saying there is inconsistency and manually trigger a PG >> repair? > > That would be ideal, yeah. I think that initially it makes sense to doing > *just that read* via a replica but letting the admin trigger the repair. > This most closely mirrors what scrub currently does on EIO (mark > inconsistent but let admin repair). Later, when we support automatic > repair, that option can affect both scrub and client-triggered EIOs? > > We just need to be careful that any EIO on *metadata* still triggers a > failure as we need to be especially careful about handling that. IIRC > there is a flag passed to read indicating whether EIO is okay; we should > probably use that so that EIO-ok vs EIO-notok cases are still clearly > annotated. > > sage > ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f