On Tue, 2021-04-06 at 12:32 +0200, Dan van der Ster wrote: > On Mon, Apr 5, 2021 at 8:33 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Thu, 2021-04-01 at 11:04 +0200, Dan van der Ster wrote: > > > Hi, > > > > > > Context: one of our users is mounting 350 ceph kernel PVCs per 30GB VM > > > and they notice "memory pressure". > > > > > > > Manifested how? > > Our users lost the monitoring, so we are going to try to reproduce to > get more details. > Do you know any way to see how much memory is used by the kernel > clients? (Aside from the ceph_inode_info and ceph_dentry_info which I > see in slabtop). Nothing simple, I'm afraid, and even those don't tell you the full picture. ceph_dentry_info is a separate allocation from the actual dentry. > I see that the osd_client keeps just one copy of the osdmap, so that's > going to be only ~256kB * num_clients on this particular cluster. > Do we also need to kmalloc something the size of the pg map? That > would be ~4MB * num_clients here. > Are there any other large data structures, even for idle mounts? > Almost certainly, but it's not trivial to measure them. You might start by looking at net/ceph/osdmap.c in the kernel sources and consider instrumenting it to report how large its allocations are. We simply don't keep those sorts of detailed stats of allocations that the client does. > > > When planning for k8s hosts, what would be a reasonable limit on the > > > number of ceph kernel PVCs to mount per host? > > > > > > > This seems like a really difficult thing to gauge. It depends on a > > number of different factors including amount of RAM and CPUs on the box, > > mount options, workload and applications, etc... > > > > > If one kernel mounts the > > > same cephfs several times (with different prefixes), we observed that > > > this is a unique client session. But does the ceph module globally > > > share a single copy of cluster metadata, e.g. osdmaps, or is that all > > > duplicated per session? > > > > > > > One copy per-cluster client, which should generally be shared between > > mounts to the same cluster, provided that you're using similar-enough > > mount options for the kernel to do that. > > As Sage suspected, we have a unique cephx user per PVC mounted. > We're using the manila csi, which indeed invokes mgr/volumes to create > the shares. They look like this, for reference: > > "client_metadata": { > "features": "0x0000000000007bff", > "entity_id": "pvc-691d1f23-da81-4a08-a6e7-d16f44e5f2a0", > "hostname": "paas-standard-avz-b-6qvn6", > "kernel_version": "5.10.19-200.fc33.x86_64", > "root": "/volumes/_nogroup/dbe3dbbf-e8d6-4f13-aac4-7a116d9a6772" > } > > It's good to know that by using the same cephx users, we could > optimize the clients on a given host. > > > > Also, k8s makes it trivial for a user to mount a single PVC from > > > hundreds or thousands of clients. Suppose we wanted to be able to > > > limit the number of clients per PVC -- Do you think a new > > > `max_sessions=N` cephx cap would be the best approach for this? > > > > > > > Why do you want to limit the number of clients per PVC? I'm not sure > > that would really solve anything. > > Mounting from a huge number of clients can easily overload the MDSs. > But Manila only lets us hand out CephFS quotas by rbytes or # shares. > So if we could similarly limit the number of sessions per cephx user > (i.e. per share), then we can prevent these overloads. > The problem there is that you'll end up with clients that just start suddenly failing to mount because you hit your arbitrary capacity limits, and it'll almost certainly be first-come/first served. This is a different matter than applying quotas because it potentially affects you at mount time. > > > > > > FWIW, I'm not a fan of solutions that end up with clients pooping > > themselves because they get back some esoteric error due to exceeding a > > limit when trying to mount or something. > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > > -- Jeff Layton <jlayton@xxxxxxxxxx> _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx