On Fri, Sep 7, 2018 at 2:11 AM, Luis Henriques <lhenriques@xxxxxxxx> wrote: > Gregory Farnum <gfarnum@xxxxxxxxxx> writes: > >> I don't have much useful to say here (unless Zheng wants me to look >> carefully at the use of copy-get), but I'm excited to see this getting >> done! :) >> >> One thing I will note is that it might be a good idea if at least one >> of the system admin or the Ceph cluster admin can disable this >> behavior, just in case bugs turn up with the copy-from op. I don't >> expect any as this is a pretty friendly use-case for it (protected by >> FS caps, hurray!) but it is the first non-cache-tiering user that will >> turn up in the wild I'm aware of. > > If I understand you correctly, you're talking about adding a knob so > that the OSDs could simply return an error when requested to perform a > 'copy-from' op, right? That shouldn't be too difficult, I believe. Oh heavens no. I mean a switch on the client side to not invoke the copy-from op at all. Probably that would just mean doing a full read of the source and write of the destination, but perhaps you could also just unhook the copy_file_range implementation from the VFS? I would assume this is pretty trivial but if it's complicated then don't worry about it — I bring it up mostly because Josh thought there were some outstanding issues with this RADOS op but on deeper inspection he was wrong and conflating some other things. :) -Greg > > Returning an error in PrimaryLogPG::do_osd_ops if that knob is set > should do the job, but I really don't feel comfortable touching the OSDs > code. If you really think this is something useful, I can try to spend > some time looking into that, but it will take me some time :) > > Cheers, > -- > Luis > >> -Greg >> >> On Thu, Sep 6, 2018 at 9:06 AM, Luis Henriques <lhenriques@xxxxxxxx> wrote: >>> Changes since v2: >>> >>> - Files size checks are now done after we have all the required caps >>> >>> Here's the main changes since v1, after Zheng's review: >>> >>> 1. ceph_osdc_copy_from() now receives source and destination snapids >>> instead of ceph_vino structs >>> >>> 2. Also get FILE_RD capabilities in ceph_copy_file_range() for source >>> file as other clients may have dirty data in their cache. >>> >>> 3. Fallback to VFS copy_file_range default implementation if we're >>> copying beyond source file EOF >>> >>> Note that 2. required an extra patch modifying ceph_try_get_caps() so >>> that it could perform a non-blocking attempt at getting CEPH_CAP_FILE_RD >>> capabilities. >>> >>> And here's the original (v1) RFC cover letter just for reference: >>> >>> This series is my initial attempt at getting a copy_file_range syscall >>> implementation in the kernel cephfs client using the 'copy-from' RADOS >>> operation. >>> >>> The idea of getting this implemented was from Greg -- or, at least, he >>> created a feature in the tracker [1]. I just decided to give it a try >>> as the feature wasn't assigned to anyone ;-) >>> >>> I have this patchset sitting on my laptop for a while already, waiting >>> for me to revisit it, review some of its TODOs... but I finally decided >>> to send it out as-is instead, to get some early feedback. >>> >>> The first patch implements the copy-from operation in the libceph >>> module. Unfortunately, the documentation for this operation is >>> nonexistent and I had to do a lot of digging to figure out the details >>> (and I probably I missed something!). For example, initially I was >>> hoping that this operation could be used to copy more than one object at >>> the time. Doing an OSD request per object copy is not ideal, but >>> unfortunately it seems to be the only way. Anyway, my expectations are >>> that this new operation will be useful for other features in the future. >>> >>> The 2nd patch is where the copy_file_range is implemented and could >>> probably be optimised, but I didn't bother with that for now. The >>> important bit is that we still may need to do some manual copies if the >>> offsets aren't object aligned or if the length is smaller than the >>> object size. I'm using do_splice_direct() for the manual copies as it >>> was the easiest way to get a PoC running, but maybe there are better >>> ways. >>> >>> I've done some functional testing on this PoC. And it also passes the >>> generic xfstest suite, in particular the copy_file_range specific tests >>> (430-434). But I haven't done any benchmarks to measure any performance >>> changes in using this syscall. >>> >>> Any feedback is welcome, specially regarding the TODOs on the code. >>> >>> [1] https://tracker.ceph.com/issues/21944 >>> >>> >>> Luis Henriques (3): >>> ceph: add non-blocking parameter to ceph_try_get_caps() >>> ceph: support the RADOS copy-from operation >>> ceph: support copy_file_range file operation >>> >>> fs/ceph/addr.c | 2 +- >>> fs/ceph/caps.c | 7 +- >>> fs/ceph/file.c | 221 ++++++++++++++++++++++++++++++++ >>> fs/ceph/super.h | 2 +- >>> include/linux/ceph/osd_client.h | 17 +++ >>> include/linux/ceph/rados.h | 19 +++ >>> net/ceph/osd_client.c | 72 +++++++++++ >>> 7 files changed, 335 insertions(+), 5 deletions(-) >>> >>