I see a significant latency (can be minutes with 2000 disks and HZ=100) when exiting a QEMU process that has lots of disk devices via aio. The process sits idle doing nothing as zombie in exit_aio waiting for the completion. Turns out that commit 6098b45b32 ("aio: block exit_aio() until all context requests are completed") caused the delay. Patch description was: It seems that exit_aio() also needs to wait for all iocbs to complete (like io_destroy), but we missed the wait step in current implemention, so fix it in the same way as we did in io_destroy. Now: io_destroy requires to block until everything is cleaned up from its interface description in the manpage: DESCRIPTION The io_destroy() system call will attempt to cancel all outstanding asynchronous I/O operations against ctx_id, will block on the completion of all operations that could not be canceled, and will destroy the ctx_id. Does process exit require the same full blocking? We might be able to cleanup the process and let the aio data structures be freed lazily. Opinions or better ideas? Christian diff --git a/fs/aio.c b/fs/aio.c index a793f70..1e6bcdb 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -820,8 +820,6 @@ void exit_aio(struct mm_struct *mm) for (i = 0; i < table->nr; ++i) { struct kioctx *ctx = table->table[i]; - struct completion requests_done = - COMPLETION_INITIALIZER_ONSTACK(requests_done); if (!ctx) continue; @@ -833,10 +831,7 @@ void exit_aio(struct mm_struct *mm) * that it needs to unmap the area, just set it to 0. */ ctx->mmap_size = 0; - kill_ioctx(mm, ctx, &requests_done); - - /* Wait until all IO for the context are done. */ - wait_for_completion(&requests_done); + kill_ioctx(mm, ctx, NULL); } RCU_INIT_POINTER(mm->ioctx_table, NULL); -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html