Re: [PATCH v3 3/3] NFSD: Add support for encoding multiple segments

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Wed, 18 Mar 2015 16:55:54 -0400

On Wed, Mar 18, 2015 at 04:39:24PM -0400, Anna Schumaker wrote:
> On 03/18/2015 02:55 PM, J. Bruce Fields wrote:
> > On Wed, Mar 18, 2015 at 02:16:29PM -0400, Anna Schumaker wrote:
> >> On 03/17/2015 05:36 PM, J. Bruce Fields wrote:
> >>> On Tue, Mar 17, 2015 at 04:07:38PM -0400, J. Bruce Fields wrote:
> >>>> On Tue, Mar 17, 2015 at 03:56:33PM -0400, J. Bruce Fields wrote:
> >>>>> On Mon, Mar 16, 2015 at 05:18:08PM -0400, Anna Schumaker wrote:
> >>>>>> This patch implements sending an array of segments back to the client.
> >>>>>> Clients should be prepared to handle multiple segment reads to make this
> >>>>>> useful.  We try to splice the first data segment into the XDR result,
> >>>>>> and remaining segments are encoded directly.
> >>>>>
> >>>>> I'm still interested in what would happen if we started with an
> >>>>> implementation like:
> >>>>>
> >>>>> 	- if the entire requested range falls within a hole, return that
> >>>>> 	  single hole.
> >>>>> 	- otherwise, just treat the thing as one big data segment.
> >>>>>
> >>>>> That would provide a benefit in the case there are large-ish holes
> >>>>> with minimal impact otherwise.
> >>>>>
> >>>>> (Though patches for full support are still useful even if only for
> >>>>> client-testing purposes.)
> >>>>
> >>>> Also, looks like
> >>>>
> >>>> 	xvs_io -c "fiemap -v" <file>
> >>>>
> >>>> will give hole sizes for a given <file>.  (Thanks, esandeen.)  Running
> >>>> that on a few of my test vm images shows a fair number of large
> >>>> (hundreds of megs) files, which suggests identifying only >=rwsize holes
> >>>> might still be useful.
> >>>
> >>> Just for fun.... I wrote the following test program and ran it on my
> >>> collection of testing vm's.  Some looked like this:
> >>>
> >>> 	f21-1.qcow2
> >>> 	144784 -rw-------. 1 qemu qemu 8591507456 Mar 16 10:13 f21-1.qcow2
> >>> 	total hole bytes:      8443252736 (98%)
> >>> 	in aligned 1MB chunks: 8428453888 (98%)
> >>>
> >>> So, basically, read_plus would save transferring most of the data even
> >>> when only handling 1MB holes.
> >>>
> >>> But some looked like this:
> >>>
> >>> 	501524 -rw-------. 1 qemu qemu 8589934592 May 20  2014 rhel6-1-1.img
> >>> 	total hole bytes:      8077516800 (94%)
> >>> 	in aligned 1MB chunks: 0 (0%)
> >>>
> >>> So the READ_PLUS that caught every hole might save a lot, the one that
> >>> only caught 1MB holes wouldn't help at all.
> >>>
> >>> And there were lots of examples in between those two extremes.
> >>
> >> I tested with three different 512 MB files:  100% data, 100% hole, and alternating every megabyte.  The results were surprising:
> >>
> >>       |  v4.1  |  v4.2
> >> -----------------------
> >> data  | 0.685s |  0.714s
> >> hole  | 0.485s | 15.547s
> >> mixed |	1.283s |  0.448
> >>
> >> >From what I can tell, the 100% hole case takes so long because of the
> >>> SEEK_DATA call in nfsd4_encode_read_plus_hole().  I took this out to
> >>> trick the function into thinking that the entire file was already a
> >>> hole, and runtime dropped to the levels of v4.1 and v4.2.
> > 
> > Wait, that 15s is due to just one SEEK_DATA?
> 
> The server is returning a larger hole than the client can read at once, so there are several SEEK_DATA calls made to verify that there are no data segments before the end of the file.
> 
> > 
> >> I wonder
> >>> if this is filesystem dependent?  My server is exporting ext4.
> > 
> > Sounds like just a bug.  I've been doing lots of lseek(.,.,SEEK_DATA) on
> > both ext4 and xfs without seeing anything that weird.
> 
> It looks like something weird on ext4.  I switched my exported filesystem to xfs:

Huh.  Maybe we should report a bug....

> 
>       |  v4.1  |  v4.2
> ------+--------+-------
> data  | 0.764s | 1.343s

That's too bad.  Non-sparse files are surely still a common case and
we'd like to not see a slowdown there....  I wonder if we can figure out
where it's coming from?

> hole  | 0.572s | 0.205s
> mixed |	0.634s | 0.472s
> 
> 
> I bumped up the test to 1G files:
> 
>       |  v4.1  |  v4.2
> ------+--------+-------
> data  | 1.578s | 1.743s
> hole  | 1.241s | 0.443s
> mixed |	1.884s | 0.913s
> 
> Let me know if I should test anything larger!

The other thing I'd be interested in would be a "mixed" case that
alternates every 4k.  That will test the worst case where we we do a 1MB
read and get back only a 4k hole.  Aligned 1MB holes are somewhat of a
best case.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html