Re: cephfs - modifying the ceph.file.layout of existing files

Luis Henriques <lhenriques@xxxxxxx> · Thu, 28 May 2020 11:21:32 +0100

Andrej Filipcic <andrej.filipcic@xxxxxx> writes:

> Hi,
>
> I have two directories, cache_fast and cache_slow, and I would like to move the 
> least used files from fast to slow, aka, user side tiering. cache_fast is pinned
> to fast_data ssd pool, while cache_slow to hdd cephfs_data pool.
>
> $ getfattr -n ceph.dir.layout /ceph/grid/cache_fast
> getfattr: Removing leading '/' from absolute path names
> # file: ceph/grid/cache_fast
> ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304
> pool=fast_data"
>
> $ getfattr -n ceph.dir.layout /ceph/grid/cache_slow
> getfattr: Removing leading '/' from absolute path names
> # file: ceph/grid/cache_slow
> ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304
> pool=cephfs_data"
>
>
>
> "mv" from cache_fast dir to cache_slow dir only renames the file in mds, but
> does not involve migration to a different pool and changing the file layout.
>
> The only option I see at this point is to "cp" the file to a new dir and
> removing it from the old one, but this would involve client side operations and
> can be very slow.
>
> Is there any better way, that would work ceph server side?

I'm afraid I can't give you a full solution for your problem; there's
probably an easy way for achieving what you want, but I just wanted to let
you know a solution that *may* help in very specific scenarios, namely if:

1. you're using a (recent) kernel client to mount the filesystem
2. your files are all bigger than the object_size and, ideally, their
   sizes are multiple of object_size
3. stripe_count is *always* 1

Assuming all the above are true, you can (partially) offload the files
copy to the OSDs by using the copy_file_range(2) syscall.  I don't think
tools such as 'cp' will use this syscall so you may need to find
alternatives.  Here's a simple example using xfs_io:

  # create a file with 4 objects (4 * 4194304)
  xfs_io -f -c "pwrite 0 16777216" /ceph/grid/cache_fast/oldfile
  # copy the 4 objects file, preserving layout
  xfs_io -f -c "copy_range -s 0 -d 0 -l 16777216 \
      /ceph/grid/cache_slow/oldfile /ceph/grid/cache_slow/newfile
  rm /ceph/grid/cache_fast/oldfile

What will happen in this example is that the file data for oldfile will
never be read into the client.  Instead, the client will be sending
'object copy' requests to the OSDs.  Also, because this is effectively a
copy, the new file will have the layout you expect.

Finally, because the copy_file_range syscall in cephfs is disabled by
default, you'll need to have the filesystem mounted with the "copyfrom"
mount option.

So, as I said, not a real solution, but a way to eventually implement
one.

Cheers,
-- 
Luis
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx