Adding to this. I can remember that I was surprised that a mv on cephfs between directories linked to different pools, only some meta(?) data was moved/changed and some data stayed still in the old pool. I am not sure if this is still the same in newer ceph versions, but I rather see data being moved completely. That is what everyone expects, regardless if this would take more time in this case between different pools. > -----Original Message----- > From: Frank Schilder <frans@xxxxxx> > Sent: Thursday, 24 June 2021 17:34 > To: Patrick Donnelly <pdonnell@xxxxxxxxxx> > Cc: ceph-users@xxxxxxx > Subject: Re: ceph fs mv does copy, not move > > Dear Patrick, > > thanks for letting me know. > > Could you please consider to make this a ceph client mount option, for > example, '-o fast_move', that enables a code path that enforces an mv to > be a proper atomic mv with the risk that in some corner cases the target > quota is overrun? With this option enabled, a move should either be a > move or fail outright with "out of disk quota" (no partial move, no > cp+rm at all). The fail should only occur if it is absolutely obvious > that the target quota will be exceeded. Any corner cases are the > responsibility of the operator. Application crashes due to incorrect > error handling are acceptable. > > Reasoning: > > From a user's/operator's side, the preferred functionality is that in > cases where a definite quota overrun can securely be detected in > advance, the move should actually fail with "out of disk quota" instead > of resorting to cp+rm, potentially leading to partial moves and a total > mess for users/operators to clean up. In any other case, the quota > should simply be ignored and the move should be a complete atomic move > with the risk of exceeding the target quota and IO to stall. A temporary > stall or fail of IO until the operator increases the quota again is, in > my opinion and use case, highly preferable over the alternative of > cp+rm. A quota or a crashed job is fast to fix, a partial move is not. > > Some background: > > We use ceph fs as an HPC home file system and as a back-end store. Being > able to move data quickly across the entire file system is essential, > because users re-factor their directory structure containing huge > amounts of data quite often for various reasons. > > On our system, we set file system quotas mainly for psychological > reasons. We run a cron job that adjusts the quotas every day to show > between 20% and 30% free capacity on the mount points. The psychological > side here is to give an incentive to users to clean up temporary data. > It is not intended to limit usage seriously, only to limit what can be > done in between cron job runs as a safe-guard. The pool quotas set the > real hard limits. > > I'm in the process of migrating 100+TB right now and am really happy > that I still have a client where I can do an O(1) move. It would be a > disaster if I had now to use rsync or similar, which would take weeks. > > Please, in such situations where developers seem to have to make a > definite choice, consider the possibility of offering operators to > choose the alternative that suits their use case best. Adding further > options seems far better than limiting functionality in a way that > becomes a terrible burden in certain, if not many use cases. > > In ceph fs there have been many such decisions that allow for different > answers from a user/operator perspective. For example, I would prefer if > I could get rid of the attempted higher POSIX compliance level of ceph > fs compared with Lustre, just disable all the client-caps and cache- > coherence management and turn it into an awesome scale-out parallel file > system. The attempt of POSIX compliant handling of simultaneous writes > to files offers nothing to us, but costs huge in performance and forces > users to move away from perfectly reasonable HPC work flows. Also, that > it takes a TTL to expire before changes on one client become visible on > another (unless direct_io is used for all IO) is perfectly acceptable > for us given the potential performance gain due to simpler client-MDS > communication. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Patrick Donnelly <pdonnell@xxxxxxxxxx> > Sent: 24 June 2021 05:29:45 > To: Frank Schilder > Cc: ceph-users@xxxxxxx > Subject: Re: ceph fs mv does copy, not move > > Hello Frank, > > On Tue, Jun 22, 2021 at 2:16 AM Frank Schilder <frans@xxxxxx> wrote: > > > > Dear all, > > > > some time ago I reported that the kernel client resorts to a copy > instead of move when moving a file across quota domains. I was told that > the fuse client does not have this problem. If enough space is > available, a move should be a move, not a copy. > > > > Today, I tried to move a large file across quota domains testing botn, > the kernel- and the fuse client. Both still resort to a copy even though > this issue was addressed quite a while ago > (https://lists.ceph.io/hyperkitty/list/ceph- > users@xxxxxxx/thread/44AEIHNEGKV4VGCARRTARGFZ264CR4T7/#XY7ZCE3KWHI4QSUNZ > HDWL3QZQFOHXRQW). The versions I'm using are (CentOS 7) > > > > # yum list installed | grep ceph-fuse > > ceph-fuse.x86_64 2:13.2.10-0.el7 > @ceph > > > > # uname -r > > 3.10.0-1160.31.1.el7.x86_64 > > > > Any suggestions how to get this to work? I have to move directories > containing 100+ TB. > > ceph-fuse reverted this behavior in: > https://tracker.ceph.com/issues/48203 > The kernel had a patch around that time too. > > In summary, it was not possible to accurately account for the quota > usage prior to doing the rename. Rather than allow a quota to > potentially be massively overrun, we fell back to the old behavior of > not allowing it. > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Principal Software Engineer > Red Hat Sunnyvale, CA > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx