Re: Determine client/inode/dnode source of massive explosion in CephFS metadata pool usage (Red Hat Nautilus CephFS)

Eugen Block <eblock@xxxxxx> · Mon, 13 May 2024 07:37:52 +0000

I just read your message again, you only mention newly created files,  
not new clients. So my suggestion probably won't help you in this  
case, but it might help others. :-)

Zitat von Eugen Block <eblock@xxxxxx>:

Hi Paul,

I don't really have a good answer to your question, but maybe this  
approach can help track down the clients.

Each MDS client has an average "uptime" metric stored in the MDS:

storage01:~ # ceph tell mds.cephfs.storage04.uxkclk session ls
...
        "id": 409348719,
...
        "uptime": 844831.115640342,
...
            "entity_id": "nova-mount",
            "hostname": "FQDN",
            "kernel_version": "5.4.0-125-generic",
            "root": "/openstack-cluster/nova-instances"
...

This client has the shortest uptime (9 days), it was a compute node  
which was integrated into openstack 9 days ago. I don't know your  
CephFS directory structure, could this help identify the client in  
your case?

Regards,
Eugen

Zitat von Paul Browne <pfb29@xxxxxxxxx>:

Hello Ceph users,

We've recently seen a very massive uptick in the stored capacity of  
our CephFS metadata pool, 150X the raw stored capacity used in a  
very short timeframe of only 48 hours or so. The number of stored  
objects rose by ~1.5 million or so in that timeframe (attached PNG  
shows the increase)

What I'd really like to be able to determine, but haven't yet  
figured out how, is to map these newly stored objects (over this  
limited time window) to inodes/dnodes in the filesystem and from  
there to individual namespaces being used in the filesystem.

This should then allow me to track back the increased usage to  
specific projects using the filesystem for research data storage  
and give them a mild warning about possibly exhausting the  
available metadata pool capacity.

Would anyone know if there's any capability in CephFs to do  
something like this, specifically in Nautilus (being run here as  
Red Hat Ceph Storage 4)?

We've scheduled upgrades to later RHCS releases, but I'd like the  
cluster and CephFS state to be in a better place first if possible.

Thanks,
Paul Browne

[cid:e87ad248-6621-4e9d-948b-da4428f8dbb8]

*******************
Paul Browne
Research Computing Platforms
University Information Services
Roger Needham Building
JJ Thompson Avenue
University of Cambridge
Cambridge
United Kingdom
E-Mail: pfb29@xxxxxxxxx<mailto:pfb29@xxxxxxxxx>
Tel: 0044-1223-746548
*******************

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx