Re: cephfs kernel client hangs

Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> · Thu, 9 Aug 2018 11:00:52 +0100

Hi John,

thanks for the advice, it's greatly appreciated.

We have 45 x 8TB OSDs & 128GB RAM per node, this is 35% of the
recommended quantity, so our OOM problems are predictable.

I'll increase the RAM on one node to 256GB, and see if this handles OSD
fault conditions without the bluestore RAM limit.

again, many thanks

Jake

On 08/08/18 17:11, John Spray wrote:
> On Wed, Aug 8, 2018 at 4:46 PM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
>>
>> Hi John,
>>
>> With regard to memory pressure; Does the cephfs fuse client also cause a
>> deadlock - or is this just the kernel client?
> 
> TBH, I'm not expert enough on the kernel-side implementation of fuse
> to say.  Ceph does have the fuse_disable_pagecache that might reduce
> the probability of issues if you're committed to running clients and
> servers on the same node.
> 
>> We run the fuse client on ten OSD nodes, and use parsync (parallel
>> rsync) to backup two beegfs systems (~1PB).
>>
>> Ordinarily fuse works OK, but any OSD problems can cause an out of
>> memory error on other osd threads as they recover, e.g.:
>>
>> kernel: [<ffffffff9cf98906>] out_of_memory+0x4b6/0x4f0
>> kernel: Out of memory: Kill process 1927903 (ceph-osd) score 27 or
>> sacrifice child
>>
>> Limiting bluestore_cache (as follows) prevents the OOM error, and allows
>> us to run the cephfs fuse client reliably:
>>
>> bluestore_cache_size = 209715200
>> bluestore_cache_kv_max = 134217728
>>
>> We have 45 OSD's per box, 128GB RAM. Dual E5-2620 v4
>> mimic 13.2.1, Load average 16 or so is normal...
>>
>> Could our OOM errors (with a default config) be caused by us running
>> cephfs fuse on the osd servers?
> 
> I wouldn't rule it out, but this is also a pretty high density of OSDs
> per node to begin with.  If each OSD is at least a few terabytes,
> you're the wrong side of the rule of thumb on resources (1GB RAM per
> TB of OSD storage).  I'd also be concerned about having only one
> quarter of a CPU core for each OSD.  Sounds like you've got your
> settings tuned to something that's working in practice though, so I
> wouldn't mess with it :-)
> 
> John
> 
>>
>> many thanks!
>>
>> Jake
>>
>> On 07/08/18 20:36, John Spray wrote:
>>> On Tue, Aug 7, 2018 at 5:42 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
>>>>
>>>> This is the first I am hearing about this as well.
>>>
>>> This is not a Ceph-specific thing -- it can also affect similar
>>> systems like Lustre.
>>>
>>> The classic case is when under some memory pressure, the kernel tries
>>> to free memory by flushing the client's page cache, but doing the
>>> flush means allocating more memory on the server, making the memory
>>> pressure worse, until the whole thing just seizes up.
>>>
>>> John
>>>
>>>> Granted, I am using ceph-fuse rather than the kernel client at this point, but that isn’t etched in stone.
>>>>
>>>> Curious if there is more to share.
>>>>
>>>> Reed
>>>>
>>>> On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
>>>>
>>>>
>>>> Yan, Zheng <ukernel@xxxxxxxxx> 于2018年8月7日周二 下午7:51写道：
>>>>>
>>>>> On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou <deaderzzs@xxxxxxxxx> wrote:
>>>>> this can cause memory deadlock. you should avoid doing this
>>>>>
>>>>>> Yan, Zheng <ukernel@xxxxxxxxx>于2018年8月7日 周二19:12写道：
>>>>>>>
>>>>>>> did you mount cephfs on the same machines that run ceph-osd?
>>>>>>>
>>>>
>>>>
>>>> I didn't know about this. I run this setup in production. :P
>>>>
>>>> Regards,
>>>>
>>>> Webert Lima
>>>> DevOps Engineer at MAV Tecnologia
>>>> Belo Horizonte - Brasil
>>>> IRC NICK - WebertRLZ
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com