Re: cephfs - modifying the ceph.file.layout of existing files

Andrej Filipcic <andrej.filipcic@xxxxxx> · Thu, 28 May 2020 14:07:02 +0200

Thanks a lot, I will give it a try, I plan to use that in a very 
controlled environment anyway.

Best regards,
Andrej

On 2020-05-28 12:21, Luis Henriques wrote:
Andrej Filipcic <andrej.filipcic@xxxxxx> writes:

Hi,

I have two directories, cache_fast and cache_slow, and I would like to move the
least used files from fast to slow, aka, user side tiering. cache_fast is pinned
to fast_data ssd pool, while cache_slow to hdd cephfs_data pool.

$ getfattr -n ceph.dir.layout /ceph/grid/cache_fast
getfattr: Removing leading '/' from absolute path names
# file: ceph/grid/cache_fast
ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304
pool=fast_data"

$ getfattr -n ceph.dir.layout /ceph/grid/cache_slow
getfattr: Removing leading '/' from absolute path names
# file: ceph/grid/cache_slow
ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304
pool=cephfs_data"

"mv" from cache_fast dir to cache_slow dir only renames the file in mds, but
does not involve migration to a different pool and changing the file layout.

The only option I see at this point is to "cp" the file to a new dir and
removing it from the old one, but this would involve client side operations and
can be very slow.

Is there any better way, that would work ceph server side?
I'm afraid I can't give you a full solution for your problem; there's
probably an easy way for achieving what you want, but I just wanted to let
you know a solution that *may* help in very specific scenarios, namely if:

1. you're using a (recent) kernel client to mount the filesystem
2. your files are all bigger than the object_size and, ideally, their
    sizes are multiple of object_size
3. stripe_count is *always* 1

Assuming all the above are true, you can (partially) offload the files
copy to the OSDs by using the copy_file_range(2) syscall.  I don't think
tools such as 'cp' will use this syscall so you may need to find
alternatives.  Here's a simple example using xfs_io:

   # create a file with 4 objects (4 * 4194304)
   xfs_io -f -c "pwrite 0 16777216" /ceph/grid/cache_fast/oldfile
   # copy the 4 objects file, preserving layout
   xfs_io -f -c "copy_range -s 0 -d 0 -l 16777216 \
       /ceph/grid/cache_slow/oldfile /ceph/grid/cache_slow/newfile
   rm /ceph/grid/cache_fast/oldfile

What will happen in this example is that the file data for oldfile will
never be read into the client.  Instead, the client will be sending
'object copy' requests to the OSDs.  Also, because this is effectively a
copy, the new file will have the layout you expect.

Finally, because the copy_file_range syscall in cephfs is disabled by
default, you'll need to have the filesystem mounted with the "copyfrom"
mount option.

So, as I said, not a real solution, but a way to eventually implement
one.

Cheers,

--
_____________________________________________________________
   prof. dr. Andrej Filipcic,   E-mail: Andrej.Filipcic@xxxxxx
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674    Fax: +386-1-425-7074
-------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx