Re: cephfs kernel client hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John,

With regard to memory pressure; Does the cephfs fuse client also cause a
deadlock - or is this just the kernel client?

We run the fuse client on ten OSD nodes, and use parsync (parallel
rsync) to backup two beegfs systems (~1PB).

Ordinarily fuse works OK, but any OSD problems can cause an out of
memory error on other osd threads as they recover, e.g.:

kernel: [<ffffffff9cf98906>] out_of_memory+0x4b6/0x4f0
kernel: Out of memory: Kill process 1927903 (ceph-osd) score 27 or
sacrifice child

Limiting bluestore_cache (as follows) prevents the OOM error, and allows
us to run the cephfs fuse client reliably:

bluestore_cache_size = 209715200
bluestore_cache_kv_max = 134217728

We have 45 OSD's per box, 128GB RAM. Dual E5-2620 v4
mimic 13.2.1, Load average 16 or so is normal...

Could our OOM errors (with a default config) be caused by us running
cephfs fuse on the osd servers?

many thanks!

Jake

On 07/08/18 20:36, John Spray wrote:
> On Tue, Aug 7, 2018 at 5:42 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
>>
>> This is the first I am hearing about this as well.
> 
> This is not a Ceph-specific thing -- it can also affect similar
> systems like Lustre.
> 
> The classic case is when under some memory pressure, the kernel tries
> to free memory by flushing the client's page cache, but doing the
> flush means allocating more memory on the server, making the memory
> pressure worse, until the whole thing just seizes up.
> 
> John
> 
>> Granted, I am using ceph-fuse rather than the kernel client at this point, but that isn’t etched in stone.
>>
>> Curious if there is more to share.
>>
>> Reed
>>
>> On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
>>
>>
>> Yan, Zheng <ukernel@xxxxxxxxx> 于2018年8月7日周二 下午7:51写道:
>>>
>>> On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou <deaderzzs@xxxxxxxxx> wrote:
>>> this can cause memory deadlock. you should avoid doing this
>>>
>>>> Yan, Zheng <ukernel@xxxxxxxxx>于2018年8月7日 周二19:12写道:
>>>>>
>>>>> did you mount cephfs on the same machines that run ceph-osd?
>>>>>
>>
>>
>> I didn't know about this. I run this setup in production. :P
>>
>> Regards,
>>
>> Webert Lima
>> DevOps Engineer at MAV Tecnologia
>> Belo Horizonte - Brasil
>> IRC NICK - WebertRLZ
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux