RE: OSD crashed due to filestore EIO

GuangYang <yguang11@xxxxxxxxxxx> · Thu, 30 Oct 2014 11:39:01 +0000

Thanks Sage. I opened an issue in tracker to follow up on this potential enhancement - http://tracker.ceph.com/issues/9943.

Thanks,
Guang

----------------------------------------
> Date: Wed, 29 Oct 2014 08:11:01 -0700
> From: sage@xxxxxxxxxxxx
> To: yguang11@xxxxxxxxxxx
> CC: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: OSD crashed due to filestore EIO
>
> On Wed, 29 Oct 2014, GuangYang wrote:
>> Recently we observed an OSD crash due to file corruption in filesystem,
>> which leads to an assertion failure at FileStore::read as EIO is not
>> tolerated. As file corruption is normal in large deployment, I am
>> thinking if that behavior is too aggressive, especially for EC pool.
>>
>> After searching, I found this flag might help : filestore_fail_eio,
>> which can make the OSD survive an EIO failure, it is true by default
>> though. I haven't tested it yet.
>
> That will reove the immediate assert. Currently, for an object being read
> by a client, it will just pass EIO back to the client, though, which is
> clearly not what we want.
>
>> Does it make sense to adjust the behavior a little bit, if the filestore
>> read fail due to file corruption, return back the failure and at the
>> same time mark the PG as inconsistent, due the redundancy (replication
>> or EC), the request can still be served, and at the same time, we can
>> get alert saying there is inconsistency and manually trigger a PG
>> repair?
>
> That would be ideal, yeah. I think that initially it makes sense to doing
> *just that read* via a replica but letting the admin trigger the repair.
> This most closely mirrors what scrub currently does on EIO (mark
> inconsistent but let admin repair). Later, when we support automatic
> repair, that option can affect both scrub and client-triggered EIOs?
>
> We just need to be careful that any EIO on *metadata* still triggers a
> failure as we need to be especially careful about handling that. IIRC
> there is a flag passed to read indicating whether EIO is okay; we should
> probably use that so that EIO-ok vs EIO-notok cases are still clearly
> annotated.
>
> sage
>
 		 	   		  ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f