On 03/18/2015 02:55 PM, J. Bruce Fields wrote: > On Wed, Mar 18, 2015 at 02:16:29PM -0400, Anna Schumaker wrote: >> On 03/17/2015 05:36 PM, J. Bruce Fields wrote: >>> On Tue, Mar 17, 2015 at 04:07:38PM -0400, J. Bruce Fields wrote: >>>> On Tue, Mar 17, 2015 at 03:56:33PM -0400, J. Bruce Fields wrote: >>>>> On Mon, Mar 16, 2015 at 05:18:08PM -0400, Anna Schumaker wrote: >>>>>> This patch implements sending an array of segments back to the client. >>>>>> Clients should be prepared to handle multiple segment reads to make this >>>>>> useful. We try to splice the first data segment into the XDR result, >>>>>> and remaining segments are encoded directly. >>>>> >>>>> I'm still interested in what would happen if we started with an >>>>> implementation like: >>>>> >>>>> - if the entire requested range falls within a hole, return that >>>>> single hole. >>>>> - otherwise, just treat the thing as one big data segment. >>>>> >>>>> That would provide a benefit in the case there are large-ish holes >>>>> with minimal impact otherwise. >>>>> >>>>> (Though patches for full support are still useful even if only for >>>>> client-testing purposes.) >>>> >>>> Also, looks like >>>> >>>> xvs_io -c "fiemap -v" <file> >>>> >>>> will give hole sizes for a given <file>. (Thanks, esandeen.) Running >>>> that on a few of my test vm images shows a fair number of large >>>> (hundreds of megs) files, which suggests identifying only >=rwsize holes >>>> might still be useful. >>> >>> Just for fun.... I wrote the following test program and ran it on my >>> collection of testing vm's. Some looked like this: >>> >>> f21-1.qcow2 >>> 144784 -rw-------. 1 qemu qemu 8591507456 Mar 16 10:13 f21-1.qcow2 >>> total hole bytes: 8443252736 (98%) >>> in aligned 1MB chunks: 8428453888 (98%) >>> >>> So, basically, read_plus would save transferring most of the data even >>> when only handling 1MB holes. >>> >>> But some looked like this: >>> >>> 501524 -rw-------. 1 qemu qemu 8589934592 May 20 2014 rhel6-1-1.img >>> total hole bytes: 8077516800 (94%) >>> in aligned 1MB chunks: 0 (0%) >>> >>> So the READ_PLUS that caught every hole might save a lot, the one that >>> only caught 1MB holes wouldn't help at all. >>> >>> And there were lots of examples in between those two extremes. >> >> I tested with three different 512 MB files: 100% data, 100% hole, and alternating every megabyte. The results were surprising: >> >> | v4.1 | v4.2 >> ----------------------- >> data | 0.685s | 0.714s >> hole | 0.485s | 15.547s >> mixed | 1.283s | 0.448 >> >> >From what I can tell, the 100% hole case takes so long because of the >>> SEEK_DATA call in nfsd4_encode_read_plus_hole(). I took this out to >>> trick the function into thinking that the entire file was already a >>> hole, and runtime dropped to the levels of v4.1 and v4.2. > > Wait, that 15s is due to just one SEEK_DATA? The server is returning a larger hole than the client can read at once, so there are several SEEK_DATA calls made to verify that there are no data segments before the end of the file. > >> I wonder >>> if this is filesystem dependent? My server is exporting ext4. > > Sounds like just a bug. I've been doing lots of lseek(.,.,SEEK_DATA) on > both ext4 and xfs without seeing anything that weird. It looks like something weird on ext4. I switched my exported filesystem to xfs: | v4.1 | v4.2 ------+--------+------- data | 0.764s | 1.343s hole | 0.572s | 0.205s mixed | 0.634s | 0.472s I bumped up the test to 1G files: | v4.1 | v4.2 ------+--------+------- data | 1.578s | 1.743s hole | 1.241s | 0.443s mixed | 1.884s | 0.913s Let me know if I should test anything larger! Anna > > I believe it does return -ENXIO in the case SEEK_DATA is called at an > offset beyond which there's no more data. At least that's what I saw in > userspace. So maybe your code just isn't handling that case correctly? > > --b. > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html