Re: cephfs kernel client hangs

John Spray <jspray@xxxxxxxxxx> · Wed, 8 Aug 2018 17:11:31 +0100

On Wed, Aug 8, 2018 at 4:46 PM Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi John,
>
> With regard to memory pressure; Does the cephfs fuse client also cause a
> deadlock - or is this just the kernel client?

TBH, I'm not expert enough on the kernel-side implementation of fuse
to say.  Ceph does have the fuse_disable_pagecache that might reduce
the probability of issues if you're committed to running clients and
servers on the same node.

> We run the fuse client on ten OSD nodes, and use parsync (parallel
> rsync) to backup two beegfs systems (~1PB).
>
> Ordinarily fuse works OK, but any OSD problems can cause an out of
> memory error on other osd threads as they recover, e.g.:
>
> kernel: [<ffffffff9cf98906>] out_of_memory+0x4b6/0x4f0
> kernel: Out of memory: Kill process 1927903 (ceph-osd) score 27 or
> sacrifice child
>
> Limiting bluestore_cache (as follows) prevents the OOM error, and allows
> us to run the cephfs fuse client reliably:
>
> bluestore_cache_size = 209715200
> bluestore_cache_kv_max = 134217728
>
> We have 45 OSD's per box, 128GB RAM. Dual E5-2620 v4
> mimic 13.2.1, Load average 16 or so is normal...
>
> Could our OOM errors (with a default config) be caused by us running
> cephfs fuse on the osd servers?

I wouldn't rule it out, but this is also a pretty high density of OSDs
per node to begin with.  If each OSD is at least a few terabytes,
you're the wrong side of the rule of thumb on resources (1GB RAM per
TB of OSD storage).  I'd also be concerned about having only one
quarter of a CPU core for each OSD.  Sounds like you've got your
settings tuned to something that's working in practice though, so I
wouldn't mess with it :-)

John

>
> many thanks!
>
> Jake
>
> On 07/08/18 20:36, John Spray wrote:
> > On Tue, Aug 7, 2018 at 5:42 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
> >>
> >> This is the first I am hearing about this as well.
> >
> > This is not a Ceph-specific thing -- it can also affect similar
> > systems like Lustre.
> >
> > The classic case is when under some memory pressure, the kernel tries
> > to free memory by flushing the client's page cache, but doing the
> > flush means allocating more memory on the server, making the memory
> > pressure worse, until the whole thing just seizes up.
> >
> > John
> >
> >> Granted, I am using ceph-fuse rather than the kernel client at this point, but that isn’t etched in stone.
> >>
> >> Curious if there is more to share.
> >>
> >> Reed
> >>
> >> On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote:
> >>
> >>
> >> Yan, Zheng <ukernel@xxxxxxxxx> 于2018年8月7日周二 下午7:51写道：
> >>>
> >>> On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou <deaderzzs@xxxxxxxxx> wrote:
> >>> this can cause memory deadlock. you should avoid doing this
> >>>
> >>>> Yan, Zheng <ukernel@xxxxxxxxx>于2018年8月7日 周二19:12写道：
> >>>>>
> >>>>> did you mount cephfs on the same machines that run ceph-osd?
> >>>>>
> >>
> >>
> >> I didn't know about this. I run this setup in production. :P
> >>
> >> Regards,
> >>
> >> Webert Lima
> >> DevOps Engineer at MAV Tecnologia
> >> Belo Horizonte - Brasil
> >> IRC NICK - WebertRLZ
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com