Re: [PATCH] nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 04, 2011 at 02:01:55PM -0500, Trond Myklebust wrote:
> On Fri, 2011-03-04 at 11:44 -0500, Neil Horman wrote: 
> > This oops was reported recently:
> > 
> >  <IRQ>  [<ffffffff800ca161>] bad_page+0x69/0x91
> >   [<ffffffff8000b8ae>] free_hot_cold_page+0x81/0x144
> >   [<ffffffff8022f28a>] skb_release_data+0x5f/0x98
> >   [<ffffffff80028b88>] __kfree_skb+0x11/0x1a
> >   [<ffffffff800243b8>] tcp_ack+0x6a3/0x1868
> >   [<ffffffff8001bff4>] tcp_rcv_established+0x7a6/0x8b9
> >   [<ffffffff8003b684>] tcp_v4_do_rcv+0x2a/0x2fa
> >   [<ffffffff8002730e>] tcp_v4_rcv+0x9a2/0x9f6
> >   [<ffffffff80099b45>] do_timer+0x2df/0x52c
> >   [<ffffffff8003495c>] ip_local_deliver+0x19d/0x263
> >   [<ffffffff80035ab8>] ip_rcv+0x539/0x57c
> >   [<ffffffff80020ab6>] netif_receive_skb+0x470/0x49f
> >   [<ffffffff883190b5>] :virtio_net:virtnet_poll+0x46b/0x5c5
> >   [<ffffffff8000c979>] net_rx_action+0xac/0x1b3
> >   [<ffffffff80012464>] __do_softirq+0x89/0x133
> >   [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
> >   [<ffffffff8006d5f5>] do_softirq+0x2c/0x7d
> >   [<ffffffff8006d485>] do_IRQ+0xec/0xf5
> >   [<ffffffff8006bdb5>] default_idle+0x0/0x50
> >   [<ffffffff8005d615>] ret_from_intr+0x0/0xa
> >   <EOI>  [<ffffffff8006bdde>] default_idle+0x29/0x50
> >   [<ffffffff800492fd>] cpu_idle+0x95/0xb8
> >   [<ffffffff80461807>] start_kernel+0x220/0x225
> >   [<ffffffff8046122f>] _sinittext+0x22f/0x236
> > 
> > It occurs, because an skb with a fraglist was freed from the tcp retransmit
> > queue when it was acked, but a page on that fraglist had PG_Slab set (indicating
> > it was allocated from the Slab allocator (which means the free path above can't
> > safely free it via put_page)
> > 
> > We tracked this back to an nfsv4 setacl operation, in which the nfs code
> > attempted to fill convert the passed in buffer to an array of pages in
> > __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages.
> > __nfs4_proc_set_acl just converts each page in the buffer to a page struct via
> > virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab
> > bit is set.  We can't create a buffer with kmalloc and free it later in the tcp
> > ack path with put_page, so we need to either:
> > 
> > 1) ensure that when we create the list of pages, no page struct has PG_Slab set
> > or
> > 2) not use a page list to send this data
> > 
> > Given that these buffers can be multiple pages and arbitrarily sized, I think
> > (1) is the right way to go.  I've written the below patch to allocate a page
> > from the buddy allocator directly and copy the data over to it.  This ensures
> > that we have a put_page free-able page for every entry that winds up on an skb
> > frag list, so it can be safely freed when the frame is acked.  We do a put page
> > on each entry after the rpc_call_sync call so as to drop our own reference count
> > to the page, leaving only the ref count taken by tcp_sendpages.  This way the
> > data will be properly freed when the ack comes in
> > 
> > Successfully tested by myself to solve the above oops.
> > 
> > Note, as this is the result of a setacl operation that exceeded a page of data,
> > I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm
> > CCing security on this as well.
> > 
> > Signed-off-by: Neil Horman <nhorman@xxxxxxxxxxxxx>
> > CC: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
> > CC: security@xxxxxxxxxx
> > CC: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  fs/nfs/nfs4proc.c |   42 ++++++++++++++++++++++++++++++++++++++++--
> >  1 files changed, 40 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 78936a8..e4bd9b5 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -51,6 +51,7 @@
> >  #include <linux/sunrpc/bc_xprt.h>
> >  #include <linux/xattr.h>
> >  #include <linux/utsname.h>
> > +#include <linux/mm.h>
> >  
> >  #include "nfs4_fs.h"
> >  #include "delegation.h"
> > @@ -3252,6 +3253,36 @@ static void buf_to_pages(const void *buf, size_t buflen,
> >  	}
> >  }
> >  
> > +static int buf_to_pages_noslab(const void *buf, size_t buflen,
> > +		struct page **pages, unsigned int *pgbase)
> > +{
> > +	const void *p = buf;
> > +	struct page *page, *newpage, **spages;
> > +	int rc = -ENOMEM;
> > +
> > +	spages = pages;
> > +	*pgbase = offset_in_page(buf);
> > +	p -= *pgbase;
> > +	while (p < buf + buflen) {
> > +		page = virt_to_page(p);
> > +		newpage = alloc_page(GFP_KERNEL);
> > +		if (!newpage)
> > +			goto unwind;
> > +		memcpy(page_address(newpage), page_address(page),
> > +		       PAGE_CACHE_SIZE);
> 
> Why do we need to keep this byzantian offset_in_page() and
> virt_to_page() logic in order to copying data from a linear buffer into
> a set of pages?
> 
We don't I suppose, but I thought it best to follow the byzantine style of the
function immediately above it. :)  If you have a better suggestion, I'm
listening.
 

> > +		*(pages++) = newpage;
> > +		p += PAGE_CACHE_SIZE;
> > +	}
> > +
> > +	rc = 0;
> > +	return rc;
> > +
> > +unwind:
> > +	while (*spages != NULL)
> > +		__free_page(*(spages++));
> > +	return rc;
> > +}
> > +
> >  struct nfs4_cached_acl {
> >  	int cached;
> >  	size_t len;
> > @@ -3420,13 +3451,20 @@ static int __nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t bufl
> >  		.rpc_argp	= &arg,
> >  		.rpc_resp	= &res,
> >  	};
> > -	int ret;
> > +	int ret, i;
> >  
> >  	if (!nfs4_server_supports_acls(server))
> >  		return -EOPNOTSUPP;
> > +	memset(pages, 0, sizeof(struct page *) * NFS4ACL_MAXPAGES);
> 
> Why is this initialisation needed? Just return the number of pages that
> were actually allocated in the call below.
> 
Yeah, I suppose I can do that, I'll respin here shortly.
Neil
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux