Re: mount cephfs on ceph servers

Hector Martin <hector@xxxxxxxxxxxxxx> · Wed, 13 Mar 2019 06:03:40 +0900

Both, in my case (since host, both local services and NFS export use the
CephFS mount). I use the in-kernel NFS server (not nfs-ganesha).

On 13/03/2019 04.55, David C wrote:
> Out of curiosity, are you guys re-exporting the fs to clients over
> something like nfs or running applications directly on the OSD nodes? 
> 
> On Tue, 12 Mar 2019, 18:28 Paul Emmerich, <paul.emmerich@xxxxxxxx
> <mailto:paul.emmerich@xxxxxxxx>> wrote:
> 
>     Mounting kernel CephFS on an OSD node works fine with recent kernels
>     (4.14+) and enough RAM in the servers.
> 
>     We did encounter problems with older kernels though
> 
> 
>     Paul
> 
>     -- 
>     Paul Emmerich
> 
>     Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
>     croit GmbH
>     Freseniusstr. 31h
>     81247 München
>     www.croit.io <http://www.croit.io>
>     Tel: +49 89 1896585 90
> 
>     On Tue, Mar 12, 2019 at 10:07 AM Hector Martin
>     <hector@xxxxxxxxxxxxxx <mailto:hector@xxxxxxxxxxxxxx>> wrote:
>     >
>     > It's worth noting that most containerized deployments can effectively
>     > limit RAM for containers (cgroups), and the kernel has limits on how
>     > many dirty pages it can keep around.
>     >
>     > In particular, /proc/sys/vm/dirty_ratio (default: 20) means at
>     most 20%
>     > of your total RAM can be dirty FS pages. If you set up your containers
>     > such that the cumulative memory usage is capped below, say, 70% of
>     RAM,
>     > then this might effectively guarantee that you will never hit this
>     issue.
>     >
>     > On 08/03/2019 02:17, Tony Lill wrote:
>     > > AFAIR the issue is that under memory pressure, the kernel will ask
>     > > cephfs to flush pages, but that this in turn causes the osd
>     (mds?) to
>     > > require more memory to complete the flush (for network buffers,
>     etc). As
>     > > long as cephfs and the OSDs are feeding from the same kernel
>     mempool,
>     > > you are susceptible. Containers don't protect you, but a full
>     VM, like
>     > > xen or kvm? would.
>     > >
>     > > So if you don't hit the low memory situation, you will not see the
>     > > deadlock, and you can run like this for years without a problem.
>     I have.
>     > > But you are most likely to run out of memory during recovery, so
>     this
>     > > could compound your problems.
>     > >
>     > > On 3/7/19 3:56 AM, Marc Roos wrote:
>     > >>
>     > >>
>     > >> Container =  same kernel, problem is with processes using the same
>     > >> kernel.
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     > >> -----Original Message-----
>     > >> From: Daniele Riccucci [mailto:devster@xxxxxxxxxx
>     <mailto:devster@xxxxxxxxxx>]
>     > >> Sent: 07 March 2019 00:18
>     > >> To: ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     > >> Subject: Re:  mount cephfs on ceph servers
>     > >>
>     > >> Hello,
>     > >> is the deadlock risk still an issue in containerized
>     deployments? For
>     > >> example with OSD daemons in containers and mounting the
>     filesystem on
>     > >> the host machine?
>     > >> Thank you.
>     > >>
>     > >> Daniele
>     > >>
>     > >> On 06/03/19 16:40, Jake Grimmett wrote:
>     > >>> Just to add "+1" on this datapoint, based on one month usage
>     on Mimic
>     > >>> 13.2.4 essentially "it works great for us"
>     > >>>
>     > >>> Prior to this, we had issues with the kernel driver on 12.2.2.
>     This
>     > >>> could have been due to limited RAM on the osd nodes (128GB /
>     45 OSD),
>     > >>> and an older kernel.
>     > >>>
>     > >>> Upgrading the RAM to 256GB and using a RHEL 7.6 derived kernel has
>     > >>> allowed us to reliably use the kernel driver.
>     > >>>
>     > >>> We keep 30 snapshots ( one per day), have one active metadata
>     server,
>     > >>> and change several TB daily - it's much, *much* faster than
>     with fuse.
>     > >>>
>     > >>> Cluster has 10 OSD nodes, currently storing 2PB, using ec 8:2
>     coding.
>     > >>>
>     > >>> ta ta
>     > >>>
>     > >>> Jake
>     > >>>
>     > >>>
>     > >>>
>     > >>>
>     > >>> On 3/6/19 11:10 AM, Hector Martin wrote:
>     > >>>> On 06/03/2019 12:07, Zhenshi Zhou wrote:
>     > >>>>> Hi,
>     > >>>>>
>     > >>>>> I'm gonna mount cephfs from my ceph servers for some reason,
>     > >>>>> including monitors, metadata servers and osd servers. I know
>     it's
>     > >>>>> not a best practice. But what is the exact potential danger if I
>     > >>>>> mount cephfs from its own server?
>     > >>>>
>     > >>>> As a datapoint, I have been doing this on two machines
>     (single-host
>     > >>>> Ceph
>     > >>>> clusters) for months with no ill effects. The FUSE client
>     performs a
>     > >>>> lot worse than the kernel client, so I switched to the
>     latter, and
>     > >>>> it's been working well with no deadlocks.
>     > >>>>
>     > >>> _______________________________________________
>     > >>> ceph-users mailing list
>     > >>> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     > >>>
>     > >> _______________________________________________
>     > >> ceph-users mailing list
>     > >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     > >>
>     > >>
>     > >> _______________________________________________
>     > >> ceph-users mailing list
>     > >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     > >>
>     > >
>     > >
>     > > _______________________________________________
>     > > ceph-users mailing list
>     > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     > >
>     >
>     > --
>     > Hector Martin (hector@xxxxxxxxxxxxxx <mailto:hector@xxxxxxxxxxxxxx>)
>     > Public Key: https://mrcn.st/pub
>     > _______________________________________________
>     > ceph-users mailing list
>     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com