On Tue, Mar 12, 2019 at 8:56 PM David C <dcsysengineer@xxxxxxxxx> wrote: > > Out of curiosity, are you guys re-exporting the fs to clients over something like nfs or running applications directly on the OSD nodes? Kernel NFS + kernel CephFS can fall apart and deadlock itself in exciting ways... nfs-ganesha is so much better. Paul > > On Tue, 12 Mar 2019, 18:28 Paul Emmerich, <paul.emmerich@xxxxxxxx> wrote: >> >> Mounting kernel CephFS on an OSD node works fine with recent kernels >> (4.14+) and enough RAM in the servers. >> >> We did encounter problems with older kernels though >> >> >> Paul >> >> -- >> Paul Emmerich >> >> Looking for help with your Ceph cluster? Contact us at https://croit.io >> >> croit GmbH >> Freseniusstr. 31h >> 81247 München >> www.croit.io >> Tel: +49 89 1896585 90 >> >> On Tue, Mar 12, 2019 at 10:07 AM Hector Martin <hector@xxxxxxxxxxxxxx> wrote: >> > >> > It's worth noting that most containerized deployments can effectively >> > limit RAM for containers (cgroups), and the kernel has limits on how >> > many dirty pages it can keep around. >> > >> > In particular, /proc/sys/vm/dirty_ratio (default: 20) means at most 20% >> > of your total RAM can be dirty FS pages. If you set up your containers >> > such that the cumulative memory usage is capped below, say, 70% of RAM, >> > then this might effectively guarantee that you will never hit this issue. >> > >> > On 08/03/2019 02:17, Tony Lill wrote: >> > > AFAIR the issue is that under memory pressure, the kernel will ask >> > > cephfs to flush pages, but that this in turn causes the osd (mds?) to >> > > require more memory to complete the flush (for network buffers, etc). As >> > > long as cephfs and the OSDs are feeding from the same kernel mempool, >> > > you are susceptible. Containers don't protect you, but a full VM, like >> > > xen or kvm? would. >> > > >> > > So if you don't hit the low memory situation, you will not see the >> > > deadlock, and you can run like this for years without a problem. I have. >> > > But you are most likely to run out of memory during recovery, so this >> > > could compound your problems. >> > > >> > > On 3/7/19 3:56 AM, Marc Roos wrote: >> > >> >> > >> >> > >> Container = same kernel, problem is with processes using the same >> > >> kernel. >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> -----Original Message----- >> > >> From: Daniele Riccucci [mailto:devster@xxxxxxxxxx] >> > >> Sent: 07 March 2019 00:18 >> > >> To: ceph-users@xxxxxxxxxxxxxx >> > >> Subject: Re: mount cephfs on ceph servers >> > >> >> > >> Hello, >> > >> is the deadlock risk still an issue in containerized deployments? For >> > >> example with OSD daemons in containers and mounting the filesystem on >> > >> the host machine? >> > >> Thank you. >> > >> >> > >> Daniele >> > >> >> > >> On 06/03/19 16:40, Jake Grimmett wrote: >> > >>> Just to add "+1" on this datapoint, based on one month usage on Mimic >> > >>> 13.2.4 essentially "it works great for us" >> > >>> >> > >>> Prior to this, we had issues with the kernel driver on 12.2.2. This >> > >>> could have been due to limited RAM on the osd nodes (128GB / 45 OSD), >> > >>> and an older kernel. >> > >>> >> > >>> Upgrading the RAM to 256GB and using a RHEL 7.6 derived kernel has >> > >>> allowed us to reliably use the kernel driver. >> > >>> >> > >>> We keep 30 snapshots ( one per day), have one active metadata server, >> > >>> and change several TB daily - it's much, *much* faster than with fuse. >> > >>> >> > >>> Cluster has 10 OSD nodes, currently storing 2PB, using ec 8:2 coding. >> > >>> >> > >>> ta ta >> > >>> >> > >>> Jake >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> On 3/6/19 11:10 AM, Hector Martin wrote: >> > >>>> On 06/03/2019 12:07, Zhenshi Zhou wrote: >> > >>>>> Hi, >> > >>>>> >> > >>>>> I'm gonna mount cephfs from my ceph servers for some reason, >> > >>>>> including monitors, metadata servers and osd servers. I know it's >> > >>>>> not a best practice. But what is the exact potential danger if I >> > >>>>> mount cephfs from its own server? >> > >>>> >> > >>>> As a datapoint, I have been doing this on two machines (single-host >> > >>>> Ceph >> > >>>> clusters) for months with no ill effects. The FUSE client performs a >> > >>>> lot worse than the kernel client, so I switched to the latter, and >> > >>>> it's been working well with no deadlocks. >> > >>>> >> > >>> _______________________________________________ >> > >>> ceph-users mailing list >> > >>> ceph-users@xxxxxxxxxxxxxx >> > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >>> >> > >> _______________________________________________ >> > >> ceph-users mailing list >> > >> ceph-users@xxxxxxxxxxxxxx >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> > >> >> > >> _______________________________________________ >> > >> ceph-users mailing list >> > >> ceph-users@xxxxxxxxxxxxxx >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> > > >> > > >> > > _______________________________________________ >> > > ceph-users mailing list >> > > ceph-users@xxxxxxxxxxxxxx >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >> > >> > -- >> > Hector Martin (hector@xxxxxxxxxxxxxx) >> > Public Key: https://mrcn.st/pub >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com