Re: [PATCH v3 3/3] NFSD: Add support for encoding multiple segments

Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> · Wed, 18 Mar 2015 16:39:24 -0400

On 03/18/2015 02:55 PM, J. Bruce Fields wrote:
> On Wed, Mar 18, 2015 at 02:16:29PM -0400, Anna Schumaker wrote:
>> On 03/17/2015 05:36 PM, J. Bruce Fields wrote:
>>> On Tue, Mar 17, 2015 at 04:07:38PM -0400, J. Bruce Fields wrote:
>>>> On Tue, Mar 17, 2015 at 03:56:33PM -0400, J. Bruce Fields wrote:
>>>>> On Mon, Mar 16, 2015 at 05:18:08PM -0400, Anna Schumaker wrote:
>>>>>> This patch implements sending an array of segments back to the client.
>>>>>> Clients should be prepared to handle multiple segment reads to make this
>>>>>> useful.  We try to splice the first data segment into the XDR result,
>>>>>> and remaining segments are encoded directly.
>>>>>
>>>>> I'm still interested in what would happen if we started with an
>>>>> implementation like:
>>>>>
>>>>> 	- if the entire requested range falls within a hole, return that
>>>>> 	  single hole.
>>>>> 	- otherwise, just treat the thing as one big data segment.
>>>>>
>>>>> That would provide a benefit in the case there are large-ish holes
>>>>> with minimal impact otherwise.
>>>>>
>>>>> (Though patches for full support are still useful even if only for
>>>>> client-testing purposes.)
>>>>
>>>> Also, looks like
>>>>
>>>> 	xvs_io -c "fiemap -v" <file>
>>>>
>>>> will give hole sizes for a given <file>.  (Thanks, esandeen.)  Running
>>>> that on a few of my test vm images shows a fair number of large
>>>> (hundreds of megs) files, which suggests identifying only >=rwsize holes
>>>> might still be useful.
>>>
>>> Just for fun.... I wrote the following test program and ran it on my
>>> collection of testing vm's.  Some looked like this:
>>>
>>> 	f21-1.qcow2
>>> 	144784 -rw-------. 1 qemu qemu 8591507456 Mar 16 10:13 f21-1.qcow2
>>> 	total hole bytes:      8443252736 (98%)
>>> 	in aligned 1MB chunks: 8428453888 (98%)
>>>
>>> So, basically, read_plus would save transferring most of the data even
>>> when only handling 1MB holes.
>>>
>>> But some looked like this:
>>>
>>> 	501524 -rw-------. 1 qemu qemu 8589934592 May 20  2014 rhel6-1-1.img
>>> 	total hole bytes:      8077516800 (94%)
>>> 	in aligned 1MB chunks: 0 (0%)
>>>
>>> So the READ_PLUS that caught every hole might save a lot, the one that
>>> only caught 1MB holes wouldn't help at all.
>>>
>>> And there were lots of examples in between those two extremes.
>>
>> I tested with three different 512 MB files:  100% data, 100% hole, and alternating every megabyte.  The results were surprising:
>>
>>       |  v4.1  |  v4.2
>> -----------------------
>> data  | 0.685s |  0.714s
>> hole  | 0.485s | 15.547s
>> mixed |	1.283s |  0.448
>>
>> >From what I can tell, the 100% hole case takes so long because of the
>>> SEEK_DATA call in nfsd4_encode_read_plus_hole().  I took this out to
>>> trick the function into thinking that the entire file was already a
>>> hole, and runtime dropped to the levels of v4.1 and v4.2.
> 
> Wait, that 15s is due to just one SEEK_DATA?

The server is returning a larger hole than the client can read at once, so there are several SEEK_DATA calls made to verify that there are no data segments before the end of the file.

> 
>> I wonder
>>> if this is filesystem dependent?  My server is exporting ext4.
> 
> Sounds like just a bug.  I've been doing lots of lseek(.,.,SEEK_DATA) on
> both ext4 and xfs without seeing anything that weird.

It looks like something weird on ext4.  I switched my exported filesystem to xfs:

      |  v4.1  |  v4.2
------+--------+-------
data  | 0.764s | 1.343s
hole  | 0.572s | 0.205s
mixed |	0.634s | 0.472s

I bumped up the test to 1G files:

      |  v4.1  |  v4.2
------+--------+-------
data  | 1.578s | 1.743s
hole  | 1.241s | 0.443s
mixed |	1.884s | 0.913s

Let me know if I should test anything larger!

Anna
> 
> I believe it does return -ENXIO in the case SEEK_DATA is called at an
> offset beyond which there's no more data.  At least that's what I saw in
> userspace.  So maybe your code just isn't handling that case correctly?
> 
> --b.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html