cephfs automatic data pool cleanup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi *,

during the last weeks, we noticed some strange behavior of our CephFS data pool (not metadata). As things have worked out over time, I'm just asking here so that I can better understand what to look out for in the future.

This is on a three-node Ceph Luminous (12.2.1) cluster with one active MDS and one standby MDS. We have a range of machines mounting that single CephFS via kernel mounts, using different versions of Linux kernels (all at least 4.4, with vendor backports).

We observed an ever-increasing number of objects and space allocation on the (HDD-based, replicated) CephFS data pool, although the actual file system usage didn't grow over time and actually decreased significantly during that time period. The pool allocation went above all warn and crit levels, forcing us to add new OSDs (our first three Bluestore OSDs - all others are file-based) to relief pressure, if only for some time.

Part of the growth seems to be related to a large nightly compile job, that was using CephFS via an NFS server (kernel-based) exposing the kernel-mounted CephFS to many nodes: Once we stopped that job, pool allocation growth significantly slowed (but didn't stop).

Further diagnosis hinted that the data pool had many orphan objects, that is objects for inodes we could not locate in the live CephFS.

All the time, we did not notice any significant growth of the metadata pool (SSD-based) nor obvious errors in the Ceph logs (Ceph, MDS, OSDs). Except for the fill levels, the cluster was healthy. Restarting MDSs did not help.

Then we had one of the nodes crash for a lack of memory (MDS was > 12 GB, plus the new Bluestore OSD and probably the 12.2.1 BlueStore memory leak).

We brought the node back online and at first had MDS report an inconsistent file system, though no other errors were reported. Once we restarted the other MDS (by then active MDS on another node), that problem went away, too, and we were back online. We did not restart clients, neither CephFS mounts nor rbd clients.

The following day we noticed an ongoing significant decrease in the number of objects in the CephFS data pool. As we couldn't spot any actual problems with the content of the CephFS (which was rather stable at the time), we sat back and watched - after some hours, the pool stabilized in size and was at a total size a bit closer to the actual CephFS content than before the mass deletion (FS size around 630 GB per "df" output, current data pool size about 1100 GB, peak size was around 1.3 TB before the mass deletion).

What may it have been that we were watching - some form of garbage collection that was triggered by the node outage? Is this something we could have triggered manually before, to avoid the free space problems we faced? Or is this something unexpected, that should have happened auto-magically and much more often, but that for some reason didn't occur in our environment?

Thank you for any ideas and/or pointers you may share.

Regards,
J

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux