I don't have much useful to say here (unless Zheng wants me to look carefully at the use of copy-get), but I'm excited to see this getting done! :) One thing I will note is that it might be a good idea if at least one of the system admin or the Ceph cluster admin can disable this behavior, just in case bugs turn up with the copy-from op. I don't expect any as this is a pretty friendly use-case for it (protected by FS caps, hurray!) but it is the first non-cache-tiering user that will turn up in the wild I'm aware of. -Greg On Thu, Sep 6, 2018 at 9:06 AM, Luis Henriques <lhenriques@xxxxxxxx> wrote: > Changes since v2: > > - Files size checks are now done after we have all the required caps > > Here's the main changes since v1, after Zheng's review: > > 1. ceph_osdc_copy_from() now receives source and destination snapids > instead of ceph_vino structs > > 2. Also get FILE_RD capabilities in ceph_copy_file_range() for source > file as other clients may have dirty data in their cache. > > 3. Fallback to VFS copy_file_range default implementation if we're > copying beyond source file EOF > > Note that 2. required an extra patch modifying ceph_try_get_caps() so > that it could perform a non-blocking attempt at getting CEPH_CAP_FILE_RD > capabilities. > > And here's the original (v1) RFC cover letter just for reference: > > This series is my initial attempt at getting a copy_file_range syscall > implementation in the kernel cephfs client using the 'copy-from' RADOS > operation. > > The idea of getting this implemented was from Greg -- or, at least, he > created a feature in the tracker [1]. I just decided to give it a try > as the feature wasn't assigned to anyone ;-) > > I have this patchset sitting on my laptop for a while already, waiting > for me to revisit it, review some of its TODOs... but I finally decided > to send it out as-is instead, to get some early feedback. > > The first patch implements the copy-from operation in the libceph > module. Unfortunately, the documentation for this operation is > nonexistent and I had to do a lot of digging to figure out the details > (and I probably I missed something!). For example, initially I was > hoping that this operation could be used to copy more than one object at > the time. Doing an OSD request per object copy is not ideal, but > unfortunately it seems to be the only way. Anyway, my expectations are > that this new operation will be useful for other features in the future. > > The 2nd patch is where the copy_file_range is implemented and could > probably be optimised, but I didn't bother with that for now. The > important bit is that we still may need to do some manual copies if the > offsets aren't object aligned or if the length is smaller than the > object size. I'm using do_splice_direct() for the manual copies as it > was the easiest way to get a PoC running, but maybe there are better > ways. > > I've done some functional testing on this PoC. And it also passes the > generic xfstest suite, in particular the copy_file_range specific tests > (430-434). But I haven't done any benchmarks to measure any performance > changes in using this syscall. > > Any feedback is welcome, specially regarding the TODOs on the code. > > [1] https://tracker.ceph.com/issues/21944 > > > Luis Henriques (3): > ceph: add non-blocking parameter to ceph_try_get_caps() > ceph: support the RADOS copy-from operation > ceph: support copy_file_range file operation > > fs/ceph/addr.c | 2 +- > fs/ceph/caps.c | 7 +- > fs/ceph/file.c | 221 ++++++++++++++++++++++++++++++++ > fs/ceph/super.h | 2 +- > include/linux/ceph/osd_client.h | 17 +++ > include/linux/ceph/rados.h | 19 +++ > net/ceph/osd_client.c | 72 +++++++++++ > 7 files changed, 335 insertions(+), 5 deletions(-) >