Re: multifs and snapshots

Toby Darling <toby@xxxxxxxxxxxxxxxxx> · Mon, 11 Nov 2024 13:53:20 +0000

Hi Dmitry

On 15/08/2023 03:14, Dmitry Melekhov wrote:

15.08.2023 04:17, Patrick Donnelly пишет:
On Mon, Aug 14, 2023 at 2:44 AM Dmitry Melekhov <dm@xxxxxxxxxx> wrote:

Hello!

There is note here https://docs.ceph.com/en/reef/dev/cephfs-snapshots/

about multiply filesystems and snapshots:

If each FS gets its own pool things probably work, but this isn’t tested
and may not be true.

Is it still untested ?
There is no reason to expect it to not work. I think the documentation
is too cautious.

Thank you!

May be somebody can share how it works in production?

Did you make any progress with snapshots on multiple filesystems with 
separate pools? I ask as I'm seeing some odd behaviour, and think this 
may be the root cause. I've just found the same warning as you.

On one cluster, we have more than 20 filesystems+pools, with 160 
subdirectories between them. The backup system uses 16 compute nodes in 
parallel, each backup process is assigned a subdirectory, makes a 
snapshot, syncs from that, deletes the snapshot, repeat.

What I'm seeing is randomly (a couple of times a week?), one of the 
backup processes deletes its entire backup target directory, as if the 
source didn't exist. My hunch [now] is that the snap it was using had a 
snapid clash which was affected by a different backup process rmdir'ing it.

This happens on both of our multifs clusters: "ceph version 17.2.7" and 
"MDS version: ceph version 19.2.0".

Our third cluster is a single filesystem+pool, 17.2.7, that utilises the 
same backup system, but does not exhibit this issue.

Cheers
Toby
--
Toby Darling, Scientific Computing (2N249)
MRC Laboratory of Molecular Biology
https://www.mrc-lmb.cam.ac.uk/scicomp/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx