Hi Patrick,
On 30.11.23 03:58, Patrick Donnelly wrote:
I've not yet fully reviewed the logs but it seems there is a bug in
the detection logic which causes a spurious abort. This does not
appear to be actually new damage.
We are accessing the metadata (read-only) daily. The issue only popped
up after updating to 17.2.7. Of course, this does not mean that there
was no damage there before, only that it was not detected.
Are you using postgres?
Not on top of CephFS, no. We do use postgres on some RBD volumes.
If you can share details about your snapshot
workflow and general workloads that would be helpful (privately if
desired).
Our CephFS root looks like this:
/archive
/homes
/no-snapshot
/other-snapshot
/scratch
We are running snapshots on /homes and /other-snapshot with the same
schedule. We mount the filesystem with a Kernel client on one of the
Ceph Hosts (not running the MDS) and mkdir / rmdir as needed.
- daily between 06:00 and 19:45 UTC (inclusive): Create a snapshot every
15 minutes, delete it unless it is hourly (xx:00) one hour later
- daily on the full hour: Create a snapshot, delete the 24 hours old
snapshot unless it is midnight
- daily at midnight delete the snapshot from 14 days ago unless it is Sunday
- every Sunday at midnight delete the snapshot from 8 weeks ago
Workload is two main Samba servers (one only sharing a subdirectory
which is generally not accessed on the other). Client access to those
servers is limited to 1GBit/s each. Until Tuesday, we also had a
mailserver with Dovecot running on top of CephFS. This was migrated on
Tuesday to an RBD volume as we had some issues with hanging access to
some files / directories (interestingly only in the main tree, in
snapshots access was without issue). Additionally, we have a Nextcloud
instance with ~200 active users storing data in CephFS as well as some
other Kernel clients with little / sporadic traffic, some running Samba,
some NFS, some interactive SSH / x2go servers with direct user access,
some specialised web applications (notably OMERO).
We run daily incremental backups of most of the CephFS content with
Bareos running on a dedicated server which has the whole CephFS tree
mounted read-only. For most data a full backup is performed every two
months, for some data only every six months. The affected area is
contained in this "every six months" full backup portion of the file
system tree.
Two weeks ago we deleted a folder structure with 6 TB, average file size
in the range of 1GB. The structure was unter /other-snapshot as well.
This led to severe load on the MDS, especially starting midnight. In
conjunction with Ubuntu kernel mount, we also had issues with
non-released capabilities preventing read-access to the /other-snapshot
part.
To combat these lingering problems, we deleted all snapshots in
/other-snapshot which led to a half a dozen PGs stuck in snaptrim state
(and a few hundred in snaptrim_wait). Updating from 17.2.6 to 17.2.7
solved that issue quickly, the affected PGs became unstuck and the whole
cluster was in active+clean a few hours later.
For now, I'll hold off on running first-damage.py to try to remove the
affected files / inodes. Ultimately however, this seems to be the most
sensible solution to me, at least with regards to cluster downtime.
Please give me another day to review then feel free to use
first-damage.py to cleanup. If you see new damage please upload the
logs.
We are in no hurry and will probably run first-damage.py sometime next
week. I will report new damage if it comes in.
Cheers
Sebastian
--
Dr. Sebastian Knust | Bielefeld University
IT Administrator | Faculty of Physics
Office: D2-110 | Universitätsstr. 25
Phone: +49 521 106 5234 | 33615 Bielefeld
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx