Re: CephFS "move" operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 25.05.2018 um 15:39 schrieb Sage Weil:
> On Fri, 25 May 2018, Oliver Freyermuth wrote:
>> Dear Ric,
>>
>> I played around a bit - the common denominator seems to be: Moving it 
>> within a directory subtree below a directory for which max_bytes / 
>> max_files quota settings are set, things work fine. Moving it to another 
>> directory tree without quota settings / with different quota settings, 
>> rename() returns EXDEV.
> 
> Aha, yes, this is the issue.
> 
> When you set a quota you force subvolume-like behavior.  This is done 
> because hard links across this quota boundary won't correctly account for 
> utilization (only one of the file links will accrue usage).  The 
> expectation is that quotas are usually set in locations that aren't 
> frequently renamed across.

Understood, that explains it. That's indeed also true for our application in most cases - 
but sometimes, we have the case that users want to migrate their data to group storage, or vice-versa. 

> 
> It might be possible to allow rename(2) to proceed in cases where 
> nlink==1, but the behavior will probably seem inconsistent (some files get 
> EXDEV, some don't).

I believe even this would be extremely helpful, performance-wise. At least in our case, hardlinks are seldomly used,
it's more about data movement between user, group and scratch areas. 
For files with nlinks>1, it's more or less expected a copy has to be performed when crossing quota boundaries (I think). 

Cheers,
	Oliver

> 
> sage
> 
> 
> 
>>
>> Cheers, Oliver
>>
>>
>> Am 25.05.2018 um 15:18 schrieb Ric Wheeler:
>>> That seems to be the issue - we need to understand why rename sees them as different.
>>>
>>> Ric
>>>
>>>
>>> On Fri, May 25, 2018, 9:15 AM Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx>> wrote:
>>>
>>>     Mhhhm... that's funny, I checked an mv with an strace now. I get:
>>>     ---------------------------------------------------------------------------------
>>>     access("/cephfs/some_folder/file", W_OK) = 0
>>>     rename("foo", "/cephfs/some_folder/file") = -1 EXDEV (Invalid cross-device link)
>>>     unlink("/cephfs/some_folder/file") = 0
>>>     lgetxattr("foo", "security.selinux", "system_u:object_r:fusefs_t:s0", 255) = 30
>>>     ---------------------------------------------------------------------------------
>>>     But I can assure it's only a single filesystem, and a single ceph-fuse client running.
>>>
>>>     Same happens when using absolute paths.
>>>
>>>     Cheers,
>>>             Oliver
>>>
>>>     Am 25.05.2018 um 15:06 schrieb Ric Wheeler:
>>>     > We should look at what mv uses to see if it thinks the directories are on different file systems.
>>>     >
>>>     > If the fstat or whatever it looks at is confused, that might explain it.
>>>     >
>>>     > Ric
>>>     >
>>>     >
>>>     > On Fri, May 25, 2018, 9:04 AM Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx> <mailto:freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx>>> wrote:
>>>     >
>>>     >     Am 25.05.2018 um 14:57 schrieb Ric Wheeler:
>>>     >     > Is this move between directories on the same file system?
>>>     >
>>>     >     It is, we only have a single CephFS in use. There's also only a single ceph-fuse client running.
>>>     >
>>>     >     What's different, though, are different ACLs set for source and target directory, and owner / group,
>>>     >     but I hope that should not matter.
>>>     >
>>>     >     All the best,
>>>     >     Oliver
>>>     >
>>>     >     > Rename as a system call only works within a file system.
>>>     >     >
>>>     >     > The user space mv command becomes a copy when not the same file system. 
>>>     >     >
>>>     >     > Regards,
>>>     >     >
>>>     >     > Ric
>>>     >     >
>>>     >     >
>>>     >     > On Fri, May 25, 2018, 8:51 AM John Spray <jspray@xxxxxxxxxx <mailto:jspray@xxxxxxxxxx> <mailto:jspray@xxxxxxxxxx <mailto:jspray@xxxxxxxxxx>> <mailto:jspray@xxxxxxxxxx <mailto:jspray@xxxxxxxxxx> <mailto:jspray@xxxxxxxxxx <mailto:jspray@xxxxxxxxxx>>>> wrote:
>>>     >     >
>>>     >     >     On Fri, May 25, 2018 at 1:10 PM, Oliver Freyermuth
>>>     >     >     <freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx> <mailto:freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx>> <mailto:freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx> <mailto:freyermuth@xxxxxxxxxxxxxxxxxx <mailto:freyermuth@xxxxxxxxxxxxxxxxxx>>>> wrote:
>>>     >     >     > Dear Cephalopodians,
>>>     >     >     >
>>>     >     >     > I was wondering why a simple "mv" is taking extraordinarily long on CephFS and must note that,
>>>     >     >     > at least with the fuse-client (12.2.5) and when moving a file from one directory to another,
>>>     >     >     > the file appears to be copied first (byte by byte, traffic going through the client?) before the initial file is deleted.
>>>     >     >     >
>>>     >     >     > Is this true, or am I missing something?
>>>     >     >
>>>     >     >     A mv should not involve copying a file through the client -- it's
>>>     >     >     implemented in the MDS as a rename from one location to another.
>>>     >     >     What's the observation that's making it seem like the data is going
>>>     >     >     through the client?
>>>     >     >
>>>     >     >     John
>>>     >     >
>>>     >     >     >
>>>     >     >     > For large files, this might be rather time consuming,
>>>     >     >     > and we should certainly advise all our users to not move files around needlessly if this is the case.
>>>     >     >     >
>>>     >     >     > Cheers,
>>>     >     >     >         Oliver
>>>     >     >     >
>>>     >     >     >
>>>     >     >     > _______________________________________________
>>>     >     >     > ceph-users mailing list
>>>     >     >     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>> <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>>
>>>     >     >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>     >     >     >
>>>     >     >     _______________________________________________
>>>     >     >     ceph-users mailing list
>>>     >     >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>> <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>>
>>>     >     >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>     >     >
>>>     >
>>>
>>>
>>>
>>
>>


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux