On Wed, 2012-12-12 at 11:20 -0500, Fred Isaman wrote: > On Wed, Dec 12, 2012 at 9:46 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > One of our QA folks found that the attached testcase would segfault > > when run on a recent rhel6 kernel that has a backport of the pnfs dio > > code. I get the same segfault when I run it on a 3.7.0 kernel as well. > > > > I think the problem is that because the buffer we're reading into is on > > the stack, the kernel is scribbling over the rest of the page after the > > read and corrupting it. > > > > The problem, I think is this block in nfs_direct_read_completion(): > > > > -----------------------[snip]----------------------- > > if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) { > > if (bytes > hdr->good_bytes) > > zero_user(page, 0, PAGE_SIZE); > > else if (hdr->good_bytes - bytes < PAGE_SIZE) > > zero_user_segment(page, > > hdr->good_bytes & ~PAGE_MASK, > > PAGE_SIZE); > > } > > -----------------------[snip]----------------------- > > > > If I comment that out, then the test passes and it doesn't scribble > > over memory. I'm not clear on what that block is trying to accomplish. > > If we get a short read in the DIO codepath, I don't think we ought to > > be zeroing out the rest of the page. We should just return the number > > of bytes read and be done with it. > > > > I would say the problem is not zeroing memory, but that the code isn't > taking into account the offsets into the page. The memory region lies after the EOF, so we're not counting it in dreq->count. More importantly, it won't be accounted for in the read() system call return value, so I think we should avoid touching it. > > I'm also suspicious of the "if (!PageCompound(page))" check in that > > function as well. It doesn't seem like we ought to be marking pages > > dirty in the DIO codepaths, should we? > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > I'm not sure if we should, but code to do so has been around forever. > The exception for PageCompound is from commit 566dd6064e89b "NFS: Make > directIO aware of compound pages", almost 7 years ago. Right. The code for doing this goes back to the original O_DIRECT implementation in 2003. IIRC, Andrew Morton asked us to do it in order to make things like O_DIRECT read into an mmapped memory region work correctly. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥