On Mon, Apr 5, 2021 at 8:33 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Thu, 2021-04-01 at 11:04 +0200, Dan van der Ster wrote: > > Hi, > > > > Context: one of our users is mounting 350 ceph kernel PVCs per 30GB VM > > and they notice "memory pressure". > > > > Manifested how? Our users lost the monitoring, so we are going to try to reproduce to get more details. Do you know any way to see how much memory is used by the kernel clients? (Aside from the ceph_inode_info and ceph_dentry_info which I see in slabtop). I see that the osd_client keeps just one copy of the osdmap, so that's going to be only ~256kB * num_clients on this particular cluster. Do we also need to kmalloc something the size of the pg map? That would be ~4MB * num_clients here. Are there any other large data structures, even for idle mounts? > > When planning for k8s hosts, what would be a reasonable limit on the > > number of ceph kernel PVCs to mount per host? > > > > This seems like a really difficult thing to gauge. It depends on a > number of different factors including amount of RAM and CPUs on the box, > mount options, workload and applications, etc... > > > If one kernel mounts the > > same cephfs several times (with different prefixes), we observed that > > this is a unique client session. But does the ceph module globally > > share a single copy of cluster metadata, e.g. osdmaps, or is that all > > duplicated per session? > > > > One copy per-cluster client, which should generally be shared between > mounts to the same cluster, provided that you're using similar-enough > mount options for the kernel to do that. As Sage suspected, we have a unique cephx user per PVC mounted. We're using the manila csi, which indeed invokes mgr/volumes to create the shares. They look like this, for reference: "client_metadata": { "features": "0x0000000000007bff", "entity_id": "pvc-691d1f23-da81-4a08-a6e7-d16f44e5f2a0", "hostname": "paas-standard-avz-b-6qvn6", "kernel_version": "5.10.19-200.fc33.x86_64", "root": "/volumes/_nogroup/dbe3dbbf-e8d6-4f13-aac4-7a116d9a6772" } It's good to know that by using the same cephx users, we could optimize the clients on a given host. > > Also, k8s makes it trivial for a user to mount a single PVC from > > hundreds or thousands of clients. Suppose we wanted to be able to > > limit the number of clients per PVC -- Do you think a new > > `max_sessions=N` cephx cap would be the best approach for this? > > > > Why do you want to limit the number of clients per PVC? I'm not sure > that would really solve anything. Mounting from a huge number of clients can easily overload the MDSs. But Manila only lets us hand out CephFS quotas by rbytes or # shares. So if we could similarly limit the number of sessions per cephx user (i.e. per share), then we can prevent these overloads. Cheers, Dan > > FWIW, I'm not a fan of solutions that end up with clients pooping > themselves because they get back some esoteric error due to exceeding a > limit when trying to mount or something. > > -- > Jeff Layton <jlayton@xxxxxxxxxx> > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx