Since we do not have additional protection on the page at the read events side, so it is possible that the read of the page takes place after the page has been freed and allocated to another part of the kernel. This would result in the read returning invalid information. As a result, for example, we have the following problem: thread 1 | thread 2 | aio_migratepage() | |-> take ctx->completion_lock | |-> migrate_page_copy(new, old) | | *NOW*, ctx->ring_pages[idx] == old | | | *NOW*, ctx->ring_pages[idx] == old | aio_read_events_ring() | |-> ring = kmap_atomic(ctx->ring_pages[0]) | |-> ring->head = head; *HERE, write to the old ring page* | |-> kunmap_atomic(ring); | |-> ctx->ring_pages[idx] = new | | *BUT NOW*, the content of | | ring_pages[idx] is old. | |-> release ctx->completion_lock | As above, the new ring page will not be updated. Fix this issue, as well as prevent races in aio_ring_setup() by taking the ring_lock mutex and completion_lock during page migration and where otherwise applicable. Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx> Signed-off-by: Tang Chen <tangchen@xxxxxxxxxxxxxx> Signed-off-by: Gu Zheng <guz.fnst@xxxxxxxxxxxxxx> --- v2: Merged Tang Chen's patch to use the spin_lock to protect the ring buffer update. Use ring_lock rather than the additional spin_lock as Benjamin LaHaise suggested. --- fs/aio.c | 23 ++++++++++++++++++++++- 1 files changed, 22 insertions(+), 1 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 6453c12..ee74704 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -298,6 +298,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new, /* Extra ref cnt for rind_pages[] array */ get_page(new); + /* Ensure no aio read events is going when migrating page */ + mutex_lock(&ctx->ring_lock); + rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1); if (rc != MIGRATEPAGE_SUCCESS) { put_page(new); @@ -312,6 +315,8 @@ static int aio_migratepage(struct address_space *mapping, struct page *new, put_page(old); + mutex_unlock(&ctx->ring_lock); + return rc; } #endif @@ -523,9 +528,18 @@ static int ioctx_add_table(struct kioctx *ctx, struct mm_struct *mm) rcu_read_unlock(); spin_unlock(&mm->ioctx_lock); + /* + * Accessing ring pages must be done + * holding ctx->completion_lock to + * prevent aio ring page migration + * procedure from migrating ring pages. + */ + spin_lock_irq(&ctx->completion_lock); ring = kmap_atomic(ctx->ring_pages[0]); ring->id = ctx->id; kunmap_atomic(ring); + spin_unlock_irq(&ctx->completion_lock); + return 0; } @@ -624,7 +638,14 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) if (!ctx->cpu) goto err; - if (aio_setup_ring(ctx) < 0) + /* + * Prevent races with page migration in aio_setup_ring() by holding + * the ring_lock mutex. + */ + mutex_lock(&ctx->ring_lock); + err = aio_setup_ring(ctx); + mutex_unlock(&ctx->ring_lock); + if (err < 0) goto err; atomic_set(&ctx->reqs_available, ctx->nr_events - 1); -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html