Re: reproducible rbd-nbd crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/14/2019 06:55 PM, Mike Christie wrote:
> On 08/14/2019 02:09 PM, Mike Christie wrote:
>> On 08/14/2019 07:35 AM, Marc Schöchlin wrote:
>>>>> 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd.
>>>>> He removed that code from the krbd. I will ping him on that.
>>>
>>> Interesting. I activated Coredumps for that processes - probably we can
>>> find something interesting here...
>>>
>>
>> Can you replicate the problem with timeout=0 on a 4.4 kernel (ceph
>> version does not matter as long as its known to hit the problem). When
>> you start to see IO hang and it gets jammed up can you do:
>>
>> dmesg -c; echo w >/proc/sysrq-trigger; dmesg -c >waiting-tasks.txt
>>
>> and give me the waiting-tasks.txt so I can check if we are stuck in the
>> kernel waiting for memory.
> 
> Don't waste your time. I found a way to replicate it now.
> 

Just a quick update.

Looks like we are trying to allocate memory in the IO path in a way that
can swing back on us, so we can end up locking up. You are probably not
hitting this with krbd in your setup because normally it's preallocating
structs, using flags like GFP_NOIO, etc. For rbd-nbd, we cannot
preallocate some structs and cannot control the allocation flags for
some operations initiated from userspace, so its possible to hit this
every IO. I can replicate this now in a second just doing a cp -r.

It's not going to be a simple fix. We have had a similar issue for
storage daemons like iscsid and multipathd since they were created. It's
less likey to hit with them because you only hit the paths they cannot
control memory allocation behavior during recovery.

I am looking into some things now.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux