Sorry for the delay, I'm finally back from the holiday break :)
(1) The filesystem AIO patchset, attempts to address one part of
the problem
which is to make regular file IO, (without O_DIRECT)
asynchronous (mainly
the case of reads of uncached or partially cached files, and
O_SYNC writes).
One of the properties of the currently implemented EIOCBRETRY aio
path is that ->mm is the only field in current which matches the
submitting task_struct while inside the retry path.
It looks like a retry-based aio write path would be broken because of
this. generic_write_checks() could run in the aio thread and get its
task_struct instead of that of the submitter. The wrong rlimit will
be tested and SIGXFSZ won't be raised. remove_suid() could check the
capabilities of the aio thread instead of those of the submitter.
I don't think EIOCBRETRY is the way to go because of this increased
(and subtle!) complexity. What are the chances that we would have
ever found those bugs outside code review? How do we make sure that
current references don't sneak back in after having initially audited
the paths?
Take the io_cmd_epoll_wait patch..
issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach
Brown with
modifications from Jeff Moyer and me) addresses this problem
for native
linux aio in a simple manner.
It's simple looking, sure. This current flipping didn't even occur
to me while throwing the patch together!
But that patch ends up calling ->poll (and poll_table->qproc) and
writing to userspace (so potentially calling ->nopage) from the aio
threads. Are we sure that none of them will behave surprisingly
because current changed under them?
It might be safe now, but that isn't really the point. I'd rather we
didn't have yet one more subtle invariant to audit and maintain.
At the risk of making myself vulnerable to the charge of mentioning
vapourware, I will admit that I've been working on a (slightly mad)
implementation of async syscalls. I've been quiet about it because I
don't want to whip up complicated discussion without being able to
show code that works, even if barely. I mention it now only to make
it clear that I want to be constructive, not just critical :).
- z
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html