Re: k8s kernel clients: reasonable number of mounts per host, and limiting num client sessions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 5, 2021 at 8:33 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Thu, 2021-04-01 at 11:04 +0200, Dan van der Ster wrote:
> > Hi,
> >
> > Context: one of our users is mounting 350 ceph kernel PVCs per 30GB VM
> > and they notice "memory pressure".
> >
>
> Manifested how?

Our users lost the monitoring, so we are going to try to reproduce to
get more details.
Do you know any way to see how much memory is used by the kernel
clients? (Aside from the ceph_inode_info and ceph_dentry_info which I
see in slabtop).
I see that the osd_client keeps just one copy of the osdmap, so that's
going to be only ~256kB * num_clients on this particular cluster.
Do we also need to kmalloc something the size of the pg map? That
would be ~4MB * num_clients here.
Are there any other large data structures, even for idle mounts?

> > When planning for k8s hosts, what would be a reasonable limit on the
> > number of ceph kernel PVCs to mount per host?
> >
>
> This seems like a really difficult thing to gauge. It depends on a
> number of different factors including amount of RAM and CPUs on the box,
> mount options, workload and applications, etc...
>
> > If one kernel mounts the
> > same cephfs several times (with different prefixes), we observed that
> > this is a unique client session. But does the ceph module globally
> > share a single copy of cluster metadata, e.g. osdmaps, or is that all
> > duplicated per session?
> >
>
> One copy per-cluster client, which should generally be shared between
> mounts to the same cluster, provided that you're using similar-enough
> mount options for the kernel to do that.

As Sage suspected, we have a unique cephx user per PVC mounted.
We're using the manila csi, which indeed invokes mgr/volumes to create
the shares. They look like this, for reference:

        "client_metadata": {
            "features": "0x0000000000007bff",
            "entity_id": "pvc-691d1f23-da81-4a08-a6e7-d16f44e5f2a0",
            "hostname": "paas-standard-avz-b-6qvn6",
            "kernel_version": "5.10.19-200.fc33.x86_64",
            "root": "/volumes/_nogroup/dbe3dbbf-e8d6-4f13-aac4-7a116d9a6772"
        }

It's good to know that by using the same cephx users, we could
optimize the clients on a given host.

> > Also, k8s makes it trivial for a user to mount a single PVC from
> > hundreds or thousands of clients. Suppose we wanted to be able to
> > limit the number of clients per PVC -- Do you think a new
> > `max_sessions=N` cephx cap would be the best approach for this?
> >
>
> Why do you want to limit the number of clients per PVC? I'm not sure
> that would really solve anything.

Mounting from a huge number of clients can easily overload the MDSs.
But Manila only lets us hand out CephFS quotas by rbytes or # shares.
So if we could similarly limit the number of sessions per cephx user
(i.e. per share), then we can prevent these overloads.

Cheers, Dan


>
> FWIW, I'm not a fan of solutions that end up with clients pooping
> themselves because they get back some esoteric error due to exceeding a
> limit when trying to mount or something.
>
> --
> Jeff Layton <jlayton@xxxxxxxxxx>
>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux