Re: [PATCH v2] Prevent rqstp->rq_pages[RPCSVC_MAXPAGES] overrun

Seiichi Ikarashi <s.ikarashi@xxxxxxxxxxxxxx> · Mon, 1 Aug 2016 18:43:11 +0900

On 2016-07-29 03:25, J. Bruce Fields wrote:
> On Wed, Jul 27, 2016 at 10:06:07AM +0900, Seiichi Ikarashi wrote:
>> On 2016-07-27 09:26, J. Bruce Fields wrote:
>>> On Wed, Jul 27, 2016 at 08:57:26AM +0900, Seiichi Ikarashi wrote:
>>>> Hi Bruce,
>>>>
>>>> On 2016-07-27 05:19, J. Bruce Fields wrote:
>>>>> Thanks for the report.
>>>>>
>>>>> On Tue, Jul 26, 2016 at 11:38:11AM +0900, Seiichi Ikarashi wrote:
>>>>>> If over-"RPCSVC_MAXPAGES" pages are sent from file system through pipe_buffer,
>>>>>> nfsd_splice_actor() corrupts struct svc_rqst and results in kernel panic. It
>>>>>> actually occurred with a parallel distributed file system. It needs boundary
>>>>>> checking.
>>>>>
>>>>> This check might be useful as defensive programming, but the bug was
>>>>> elsewhere.
>>>>
>>>> Yah, I think the main factor exists in file system and/or VFS splice sides.
>>>> But I also think that NFS should defend itself against overlimit pages
>>>> because the limit is decided by NFS/SUNRPC, not by file system and others.
>>>>
>>>>>
>>>>> In theory this should be prevented by the "maxcount" calculations in
>>>>> nfsd4_encode_read().
>>>>
>>>> The "maxcount" looks just limiting the read length from the file system.
>>>> Is my understanding correct?
>>>
>>> Right.
>>>
>>>>
>>>> And the number of pages provided from the file system is up to the file system.
>>>> The file system can split the read data into an arbitrary number of pages.
>>>
>>> Oh, so if we ask the filesystem for 3 bytes it might potentially return
>>> those in 3 separate pages?  Is that at all legal?
>>
>> I think the code actually permits such a split though I am not sure that
>> one-single-byte-per-page situation really happens. :-)
>> For example, on 4kB-page architecture, 1kB-data-per-page placement will result
>> in 4 times larger number of pages than expected.
>> I do not know whether it's legal or not.
>>
>> I thought NFS needs to defend itself against such large numbers,
>> or, can somewhere in VFS splice merge such pages into a minimum set of pages?
>> I cannot find such code in VFS.
>>
>> Opinions or suggestions?
> 
> I don't know the splice interface well, but suspect the filesystem's in
> the wrong here.
> 
> Checking for overflow of the page array isn't sufficient to catch any
> such problems; if the filesystem gives us a smaller number of partial
> pages, I'd guess (not having tested) that nfsd is just going to assume
> the full page should be used, and end up sending uninitialized data back
> to the client.

I see. Now I understood why nfsd_splice_actor() ignores buf->offset if
rqstp->rq_res.page_len != 0. It premises consecutive, non-sparse data
filling in multi-pages through pipe_buffer.

I agree that checking for the overflow only is definitely insufficient.

> 
> So I wouldn't necessarily be opposed to a simple check for a misbehaving
> filesystem here, but if the required checks are more complicated and if
> we've only ever seen it against an out-of-tree filesystem then I'm less
> interested.

Yah, it was an independent file system case.

Seeing the linus tree, most file systems are just using generic_file_splice_read()
as splice_read method. So the call sequence will be...

...
=> splice_direct_to_actor()
   => do_splice_to()
      => generic_file_splice_read()
        => __generic_file_splice_read()

Here __generic_file_splice_read() looks permitting partial pages.
So it might be possible theoretically with one of in-tree file systems,
but I'm no sure...

Any comment from file system guys are welcome :-)

Seiichi.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html