Doubts regarding iov_iter_get_pages{,_alloc}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm currently reworking a device driver (unfortunately it's mostly
under NDA and I can't share details about it) and am at the point of
replacing the old jenga tower of "DIY async read/write via custom
ioctls" with something that uses the kernel's AIO infrastructure.

Essentially where I'm currently at is taking an iov_iter and produce a
DMA scatterlist from it (and I definitely think that THAT should be
something readily available, will probably contribute my implementation
once it's done). There are a few extra quirks in what I need, but those
are not important here.

Anyway, in trying to understand what I have to do to this end I was
looking at iov_iter_get_pages (and its _alloc variant), which raised
more questions than answering. So here's the code as found in linux-4.8
(linux/lib/iov_iter.c +585)

ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
		   struct page ***pages, size_t maxsize,
		   size_t *start)
{
	struct page **p;

	if (maxsize > i->count)
		maxsize = i->count;

	if (!maxsize)
		return 0;

	iterate_all_kinds(i, maxsize, v, ({
		unsigned long addr = (unsigned long)v.iov_base;
		size_t len = v.iov_len + (*start = addr & (PAGE_SIZE -	1));
		int n;
		int res;

		addr &= ~(PAGE_SIZE - 1);
		n = DIV_ROUND_UP(len, PAGE_SIZE);
		p = get_pages_array(n);
		if (!p)
			return -ENOMEM;
		res = get_user_pages_fast(addr, n, (i->type & WRITE) != WRITE, p);
		if (unlikely(res < 0)) {
			kvfree(p);
			return res;
		}
		*pages = p;
		return (res == n ? len : res * PAGE_SIZE) - *start;
	0;}),({
		/* can't be more than PAGE_SIZE */
		*start = v.bv_offset;
		*pages = p = get_pages_array(1);
		if (!p)
			return -ENOMEM;
		get_page(*p = v.bv_page);
		return v.bv_len;
	}),({
		return -EFAULT;
	})
	)
	return 0;
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc);

Here's the thing I'm dumbfounded about: This thing will never iterate
past the first entry of an iov_iter (never mind the superfluous(?) `0;`
at the end of the ITER_IOVEC step; I mean whatever that compound statement
expression is the r-value of, that return preempts it).
The execution path of every kind of step ends in an unconditional return
statement, thereby (seemingly?) cutting the whole thing short.

In case of the _alloc variant this is probably a good thing, as
otherwise the memory allocated for the page array would be leaked in
subsequent iterations.

Which leaves me with the following doubts:

What is the rationale for looking at only the very first element in an
iov_iter?

If this somehow does give the pages of everything an iov_iter covers,
how does this work?

If this gives the pages of only the very first element in an iov_iter
is this the intended behaviour of the API and what's the recommended
way of retrieving the pages of all the elements in an iov_iter?


Kind Regards,

Wolfgang
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux