On Friday 18 April 2008 21:31:20 Andrew Morton wrote: > On Fri, 18 Apr 2008 14:43:24 +1000 Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote: > > + /* How many pages will this take? */ > > + npages = 1 + (base + len - 1)/PAGE_SIZE - base/PAGE_SIZE; > > Brain hurts. I hope you got that right. I tested it when I wrote it, but just wrote a tester again: base len npages 0 1 1 0xfff 1 1 0x1000 1 1 0 4096 1 0x1 4096 2 0xfff 4096 2 0x1000 4096 1 0xfffff000 4096 1 0xfffff000 4097 4293918722 > > + if (unlikely(num_pg + npages > MAX_SKB_FRAGS)) { > > + err = -ENOSPC; > > + goto fail; > > + } > > + n = get_user_pages(current, current->mm, base, npages, > > + 0, 0, pages, NULL); > > What is the maximum numbet of pages which an unpriviliged user can > concurrently pin with this code? Since only root can open the tun device, it's currently OK. The old code kmalloced and copied: is there some mm-fu reason why pinning userspace memory is worse? But I actually think it's OK even for non-root, since these become skbs, which means they either go into an outgoing device queue or a socket queue which is accounted for exactly for this reason. > > + if (unlikely(n < 0)) { > > + err = n; > > + goto fail; > > + } > > + > > + /* Transfer pages to the frag array */ > > + for (j = 0; j < n; j++) { > > + f[num_pg].page = pages[j]; > > + if (j == 0) { > > + f[num_pg].page_offset = offset_in_page(base); > > + f[num_pg].size = min(len, PAGE_SIZE - > > + f[num_pg].page_offset); > > + } else { > > + f[num_pg].page_offset = 0; > > + f[num_pg].size = min(len, PAGE_SIZE); > > + } > > + len -= f[num_pg].size; > > + base += f[num_pg].size; > > + num_pg++; > > + } > > This loop is a fancy way of doing > > num_pg = n; Damn, you had me reworking this until I realized why. It's not: we're inside a loop, doing one iovec array element at a time. > > + if (unlikely(n != npages)) { > > + err = -EFAULT; > > + goto fail; > > + } > > why not do this immediately after running get_user_pages()? To simplify the failure path. Hmm, I would use release_pages here... > > +fail: > > + for (i = 0; i < num_pg; i++) > > + put_page(f[i].page); > > release_pages() could be a tad more efficient, but it's only error-path. ... but I didn't know that existed. Had to include pagemap.h, and it's not exported. It seems to be a useful interface; see patch. Cheers, Rusty. Subject: Export release_pages; nice undo for get_user_pages. Andrew Morton suggests tun/tap use release_pages, but it's not exported. It's not clear to me why this is in swap.c, but it exists even without CONFIG_SWAP, so that's OK. Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx> diff -r abd2ad431e5c mm/swap.c --- a/mm/swap.c Sat Apr 19 00:34:54 2008 +1000 +++ b/mm/swap.c Sat Apr 19 01:11:40 2008 +1000 @@ -346,6 +346,7 @@ void release_pages(struct page **pages, pagevec_free(&pages_to_free); } +EXPORT_SYMBOL(release_pages); /* * The pages which we're about to release may be in the deferred lru-addition _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/virtualization