Hi again Alexander,
Thanks for taking the time.
> Subvolumes exist to implement a notion of managed mountable
directories with a given maximum size, as required, e.g., by
Kubernetes RWX Persistent Volumes.
I highly doubt that the main reason was this, since (afaik) snapshots
feature predate Kubernetes.
> However, if Ceph permitted snapshots at arbitrary points within
the volume, a malicious pod could have created a snapshot, deleted
everything (not for real, "thanks" to the snapshot), written new
files, and thus evaded the quota.
Then the admin can opt to not use allow_new_snaps in that FS, or provide
key without "s" flag so client would be unable to create snapshots.
I am not proficient with cpp, but looks like even MDS code has special
handling of snapshots+subvolumes, even within quota-restricted ones.
But again, this might be an oversight and conflict of features on MDS side.
> And yes, the documentation you mention does need to be corrected.
Could you point for the place to report this to the documentation? I
would've designed my implementation completely different if this was not
implied.
Thanks again,
Gürkan
On 02/03/2025 02.10, Alexander Patrakov wrote:
Hello Gürkan,
Let me clarify and correct my answer.
I incorrectly assumed that you use Kubernetes, because its CSI driver
is, by far, the main consumer of subvolumes. Still, let me explain
this use case, as the limitations you observe naturally follow from
it.
Subvolumes exist to implement a notion of managed mountable
directories with a given maximum size, as required, e.g., by
Kubernetes RWX Persistent Volumes. In Kubernetes, the CSI driver, when
it needs to, creates a subvolume, sets a quota on it, creates a dummy
subdirectory, and mounts it in the pod that needs the Persistent
Volume. As such, the pod has no access to the top directory of the
persistent volume and thus cannot increase the quota by changing the
xattr. However, if Ceph permitted snapshots at arbitrary points within
the volume, a malicious pod could have created a snapshot, deleted
everything (not for real, "thanks" to the snapshot), written new
files, and thus evaded the quota. Thus, the only point where snapshots
are allowed is the top directory of the subvolume, where the CSI
driver can do it.
Therefore, the answer to your original question is: if you want
client-managed snapshots, do not use subvolumes, they are the wrong
abstraction for you. Just create plain old directories outside of the
/volumes path and mount them on the client.
And yes, the documentation you mention does need to be corrected.
On Sun, Mar 2, 2025 at 5:44 AM Gürkan G <ceph@xxxxxxxxx> wrote:
Hi,
This is deliberate, as otherwise they would become a mechanism for quota evasion.
This.. does not make much sense. If I give the setfattr command, everything works fine. Plus the documentation says following:
Arbitrary subtrees. Snapshots are created within any directory you choose, and cover all data in the file system under that directory.
Ref: https://docs.ceph.com/en/squid/dev/cephfs-snapshots/
In any case, please also try asking in Kubernetes forums. On Ceph side, unfortunately, everything works as intended.
I am also not using Kubernetes. This is a deployment over Debian bookworm VMs. Never mentioned a pod, the client is another Debian VM.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx