Revert "aio: block exit_aio() until all context requests are completed"

Christian Borntraeger <borntraeger@xxxxxxxxxx> · Fri, 15 May 2015 09:41:47 +0200

I see a significant latency (can be minutes with 2000 disks and HZ=100)
when exiting a QEMU process that has lots of disk devices via aio. The
process sits idle doing nothing as zombie in exit_aio waiting for the
completion.

Turns out that 
commit 6098b45b32 ("aio: block exit_aio() until all context requests are
completed") caused the delay.

Patch description was:

It seems that exit_aio() also needs to wait for all iocbs to complete (like
io_destroy), but we missed the wait step in current implemention, so fix
it in the same way as we did in io_destroy.

Now: io_destroy requires to block until everything is cleaned up from its
interface description in the manpage:
DESCRIPTION
The  io_destroy()  system call will attempt to cancel all outstanding
asynchronous I/O operations against ctx_id, will block on the completion
of all operations that could not be canceled, and will destroy the ctx_id.

Does process exit require the same full blocking? We might be able to
cleanup the process and let the aio data structures be freed lazily.
Opinions or better ideas?

Christian

diff --git a/fs/aio.c b/fs/aio.c
index a793f70..1e6bcdb 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -820,8 +820,6 @@ void exit_aio(struct mm_struct *mm)

 	for (i = 0; i < table->nr; ++i) {
 		struct kioctx *ctx = table->table[i];
-		struct completion requests_done =
-			COMPLETION_INITIALIZER_ONSTACK(requests_done);

 		if (!ctx)
 			continue;
@@ -833,10 +831,7 @@ void exit_aio(struct mm_struct *mm)
 		 * that it needs to unmap the area, just set it to 0.
 		 */
 		ctx->mmap_size = 0;
-		kill_ioctx(mm, ctx, &requests_done);
-
-		/* Wait until all IO for the context are done. */
-		wait_for_completion(&requests_done);
+		kill_ioctx(mm, ctx, NULL);
 	}

 	RCU_INIT_POINTER(mm->ioctx_table, NULL);


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html