Hi, I'm currently reworking a device driver (unfortunately it's mostly under NDA and I can't share details about it) and am at the point of replacing the old jenga tower of "DIY async read/write via custom ioctls" with something that uses the kernel's AIO infrastructure. Essentially where I'm currently at is taking an iov_iter and produce a DMA scatterlist from it (and I definitely think that THAT should be something readily available, will probably contribute my implementation once it's done). There are a few extra quirks in what I need, but those are not important here. Anyway, in trying to understand what I have to do to this end I was looking at iov_iter_get_pages (and its _alloc variant), which raised more questions than answering. So here's the code as found in linux-4.8 (linux/lib/iov_iter.c +585) ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start) { struct page **p; if (maxsize > i->count) maxsize = i->count; if (!maxsize) return 0; iterate_all_kinds(i, maxsize, v, ({ unsigned long addr = (unsigned long)v.iov_base; size_t len = v.iov_len + (*start = addr & (PAGE_SIZE - 1)); int n; int res; addr &= ~(PAGE_SIZE - 1); n = DIV_ROUND_UP(len, PAGE_SIZE); p = get_pages_array(n); if (!p) return -ENOMEM; res = get_user_pages_fast(addr, n, (i->type & WRITE) != WRITE, p); if (unlikely(res < 0)) { kvfree(p); return res; } *pages = p; return (res == n ? len : res * PAGE_SIZE) - *start; 0;}),({ /* can't be more than PAGE_SIZE */ *start = v.bv_offset; *pages = p = get_pages_array(1); if (!p) return -ENOMEM; get_page(*p = v.bv_page); return v.bv_len; }),({ return -EFAULT; }) ) return 0; } EXPORT_SYMBOL(iov_iter_get_pages_alloc); Here's the thing I'm dumbfounded about: This thing will never iterate past the first entry of an iov_iter (never mind the superfluous(?) `0;` at the end of the ITER_IOVEC step; I mean whatever that compound statement expression is the r-value of, that return preempts it). The execution path of every kind of step ends in an unconditional return statement, thereby (seemingly?) cutting the whole thing short. In case of the _alloc variant this is probably a good thing, as otherwise the memory allocated for the page array would be leaked in subsequent iterations. Which leaves me with the following doubts: What is the rationale for looking at only the very first element in an iov_iter? If this somehow does give the pages of everything an iov_iter covers, how does this work? If this gives the pages of only the very first element in an iov_iter is this the intended behaviour of the API and what's the recommended way of retrieving the pages of all the elements in an iov_iter? Kind Regards, Wolfgang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html