Hi John, With regard to memory pressure; Does the cephfs fuse client also cause a deadlock - or is this just the kernel client? We run the fuse client on ten OSD nodes, and use parsync (parallel rsync) to backup two beegfs systems (~1PB). Ordinarily fuse works OK, but any OSD problems can cause an out of memory error on other osd threads as they recover, e.g.: kernel: [<ffffffff9cf98906>] out_of_memory+0x4b6/0x4f0 kernel: Out of memory: Kill process 1927903 (ceph-osd) score 27 or sacrifice child Limiting bluestore_cache (as follows) prevents the OOM error, and allows us to run the cephfs fuse client reliably: bluestore_cache_size = 209715200 bluestore_cache_kv_max = 134217728 We have 45 OSD's per box, 128GB RAM. Dual E5-2620 v4 mimic 13.2.1, Load average 16 or so is normal... Could our OOM errors (with a default config) be caused by us running cephfs fuse on the osd servers? many thanks! Jake On 07/08/18 20:36, John Spray wrote: > On Tue, Aug 7, 2018 at 5:42 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote: >> >> This is the first I am hearing about this as well. > > This is not a Ceph-specific thing -- it can also affect similar > systems like Lustre. > > The classic case is when under some memory pressure, the kernel tries > to free memory by flushing the client's page cache, but doing the > flush means allocating more memory on the server, making the memory > pressure worse, until the whole thing just seizes up. > > John > >> Granted, I am using ceph-fuse rather than the kernel client at this point, but that isn’t etched in stone. >> >> Curious if there is more to share. >> >> Reed >> >> On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote: >> >> >> Yan, Zheng <ukernel@xxxxxxxxx> 于2018年8月7日周二 下午7:51写道: >>> >>> On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou <deaderzzs@xxxxxxxxx> wrote: >>> this can cause memory deadlock. you should avoid doing this >>> >>>> Yan, Zheng <ukernel@xxxxxxxxx>于2018年8月7日 周二19:12写道: >>>>> >>>>> did you mount cephfs on the same machines that run ceph-osd? >>>>> >> >> >> I didn't know about this. I run this setup in production. :P >> >> Regards, >> >> Webert Lima >> DevOps Engineer at MAV Tecnologia >> Belo Horizonte - Brasil >> IRC NICK - WebertRLZ >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com