Re: corrupted rbd filesystems since jewel

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Tue, 16 May 2017 21:51:03 +0200

Am 16.05.2017 um 21:45 schrieb Jason Dillaman:
> On Tue, May 16, 2017 at 3:37 PM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx> wrote:
>> We've enabled the op tracker for performance reasons while using SSD
>> only storage ;-(
> 
> Disabled you mean?
Sorry yes.

>> Can enable the op tracker using ceph osd tell? Than reproduce the
>> problem. Check what has stucked again? Or should i generate an rbd log
>> from the client?
> 
> From a super-quick glance at the code, it looks like that isn't a
> dynamic setting. Of course, it's possible that if you restart OSD 46
> to enable the op tracker, the stuck op will clear itself and the VM
> will resume.
Yes already tested this some time ago. This will resume all I/O.

> You could attempt to generate a gcore of OSD 46 to see if
> information on that op could be extracted via the debugger, but no
> guarantees.

Sorry no idea how todo that.

> You might want to verify that the stuck client and OSD 46 have an
> actual established TCP connection as well before doing any further
> actions.

Can check that. When i reproduce the issue.

Greets,
Stefan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com