Hi *,
during the last weeks, we noticed some strange behavior of our CephFS
data pool (not metadata). As things have worked out over time, I'm
just asking here so that I can better understand what to look out for
in the future.
This is on a three-node Ceph Luminous (12.2.1) cluster with one active
MDS and one standby MDS. We have a range of machines mounting that
single CephFS via kernel mounts, using different versions of Linux
kernels (all at least 4.4, with vendor backports).
We observed an ever-increasing number of objects and space allocation
on the (HDD-based, replicated) CephFS data pool, although the actual
file system usage didn't grow over time and actually decreased
significantly during that time period. The pool allocation went above
all warn and crit levels, forcing us to add new OSDs (our first three
Bluestore OSDs - all others are file-based) to relief pressure, if
only for some time.
Part of the growth seems to be related to a large nightly compile job,
that was using CephFS via an NFS server (kernel-based) exposing the
kernel-mounted CephFS to many nodes: Once we stopped that job, pool
allocation growth significantly slowed (but didn't stop).
Further diagnosis hinted that the data pool had many orphan objects,
that is objects for inodes we could not locate in the live CephFS.
All the time, we did not notice any significant growth of the metadata
pool (SSD-based) nor obvious errors in the Ceph logs (Ceph, MDS,
OSDs). Except for the fill levels, the cluster was healthy. Restarting
MDSs did not help.
Then we had one of the nodes crash for a lack of memory (MDS was > 12
GB, plus the new Bluestore OSD and probably the 12.2.1 BlueStore
memory leak).
We brought the node back online and at first had MDS report an
inconsistent file system, though no other errors were reported. Once
we restarted the other MDS (by then active MDS on another node), that
problem went away, too, and we were back online. We did not restart
clients, neither CephFS mounts nor rbd clients.
The following day we noticed an ongoing significant decrease in the
number of objects in the CephFS data pool. As we couldn't spot any
actual problems with the content of the CephFS (which was rather
stable at the time), we sat back and watched - after some hours, the
pool stabilized in size and was at a total size a bit closer to the
actual CephFS content than before the mass deletion (FS size around
630 GB per "df" output, current data pool size about 1100 GB, peak
size was around 1.3 TB before the mass deletion).
What may it have been that we were watching - some form of garbage
collection that was triggered by the node outage? Is this something we
could have triggered manually before, to avoid the free space problems
we faced? Or is this something unexpected, that should have happened
auto-magically and much more often, but that for some reason didn't
occur in our environment?
Thank you for any ideas and/or pointers you may share.
Regards,
J
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com