jbd2: fix a cause of __schedule_bug via blkdev_releasepage From: Toshiyuki Okajima <toshi.okajima@xxxxxxxxxxxxxx> A cause of this problem is calling jbd2_log_wait_commit() on jbd2_journal_try_to_free_buffers() with a read-lock via blkdev_releasepage(). This logic is for uncommitted data buffers. And a read/write-lock is required for a client usage of blkdev_releasepage. By the way, we want to release only metadata buffers on ext4_release_metadata(). Because a page which binds to blkdev is used as metadata for ext4. Therefore we don't need to wait for a commit on jbd2_journal_try_to_free_buffers() via ext4_release_matadata(). As a result, we add a jbd2_journal_try_to_free_metadata_buffers() almost same as jbd2_journal_try_to_free_buffers() except not calling jbd2_log_wait_commit(). This issue was reported by Aneesh Kumar K.V. http://marc.info/?l=linux-ext4&m=122814568309893&w=2 Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> Signed-off-by: Toshiyuki Okajima <toshi.okajima@xxxxxxxxxxxxxx> Cc: Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> Cc: "Theodore Ts'o" <tytso@xxxxxxx> -- fs/jbd2/journal.c | 1 + fs/jbd2/transaction.c | 34 ++++++++++++++++++++++++++++++---- include/linux/jbd2.h | 1 + 3 files changed, 32 insertions(+), 4 deletions(-) diff -Nurp linux-2.6.28-rc6/fs/jbd2/journal.c linux-2.6.28-rc6.2/fs/jbd2/journal.c --- linux-2.6.28-rc6/fs/jbd2/journal.c 2008-11-21 08:19:22.000000000 +0900 +++ linux-2.6.28-rc6.2/fs/jbd2/journal.c 2008-12-02 09:54:42.000000000 +0900 @@ -79,6 +79,7 @@ EXPORT_SYMBOL(jbd2_journal_wipe); EXPORT_SYMBOL(jbd2_journal_blocks_per_page); EXPORT_SYMBOL(jbd2_journal_invalidatepage); EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers); +EXPORT_SYMBOL(jbd2_journal_try_to_free_metadata_buffers); EXPORT_SYMBOL(jbd2_journal_force_commit); EXPORT_SYMBOL(jbd2_journal_file_inode); EXPORT_SYMBOL(jbd2_journal_init_jbd_inode); diff -Nurp linux-2.6.28-rc6/fs/jbd2/transaction.c linux-2.6.28-rc6.2/fs/jbd2/transaction.c --- linux-2.6.28-rc6/fs/jbd2/transaction.c 2008-11-21 08:19:22.000000000 +0900 +++ linux-2.6.28-rc6.2/fs/jbd2/transaction.c 2008-12-02 10:21:53.000000000 +0900 @@ -1497,12 +1497,14 @@ static void jbd2_journal_wait_for_transa } /** - * int jbd2_journal_try_to_free_buffers() - try to free page buffers. + * int __journal_try_to_free_buffers() - try to free page buffers. * @journal: journal for operation * @page: to try and free * @gfp_mask: we use the mask to detect how hard should we try to release * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to * release the buffers. + * @is_metadata: If true, we won't wait for commit though __GFP_WAIT + * and __GFP_FS is set. * * * For all the buffers on this page, @@ -1528,14 +1530,14 @@ static void jbd2_journal_wait_for_transa * * Who else is affected by this? hmm... Really the only contender * is do_get_write_access() - it could be looking at the buffer while - * journal_try_to_free_buffer() is changing its state. But that + * __journal_try_to_free_buffer() is changing its state. But that * cannot happen because we never reallocate freed data as metadata * while the data is part of a transaction. Yes? * * Return 0 on failure, 1 on success */ -int jbd2_journal_try_to_free_buffers(journal_t *journal, - struct page *page, gfp_t gfp_mask) +static int __journal_try_to_free_buffers(journal_t *journal, + struct page *page, gfp_t gfp_mask, bool is_metadata) { struct buffer_head *head; struct buffer_head *bh; @@ -1567,6 +1569,8 @@ int jbd2_journal_try_to_free_buffers(jou } while ((bh = bh->b_this_page) != head); ret = try_to_free_buffers(page); + if (is_metadata) + return ret; /* * There are a number of places where jbd2_journal_try_to_free_buffers() @@ -1592,6 +1596,28 @@ busy: } /* + * jbd2_journal_try_to_free_buffers: + * This is a wrapper function for __journal_try_to_free_buffers to try to + * release data. + */ +int jbd2_journal_try_to_free_buffers(journal_t *journal, + struct page *page, gfp_t gfp_mask) +{ + return __journal_try_to_free_buffers(journal, page, gfp_mask, false); +} + +/* + * jbd2_journal_try_to_free_metadata_buffers: + * This is a wrapper function for __journal_try_to_free_buffers to try to + * release metadata. + */ +int jbd2_journal_try_to_free_metadata_buffers(journal_t *journal, + struct page *page, gfp_t gfp_mask) +{ + return __journal_try_to_free_buffers(journal, page, gfp_mask, true); +} + +/* * This buffer is no longer needed. If it is on an older transaction's * checkpoint list we need to record it on this transaction's forget list * to pin this buffer (and hence its checkpointing transaction) down until diff -Nurp linux-2.6.28-rc6/include/linux/jbd2.h linux-2.6.28-rc6.2/include/linux/jbd2.h --- linux-2.6.28-rc6/include/linux/jbd2.h 2008-11-21 08:19:22.000000000 +0900 +++ linux-2.6.28-rc6.2/include/linux/jbd2.h 2008-12-02 09:59:18.000000000 +0900 @@ -1052,6 +1052,7 @@ extern void journal_sync_buffer (struct extern void jbd2_journal_invalidatepage(journal_t *, struct page *, unsigned long); extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t); +extern int jbd2_journal_try_to_free_metadata_buffers(journal_t *, struct page *, gfp_t); extern int jbd2_journal_stop(handle_t *); extern int jbd2_journal_flush (journal_t *); extern void jbd2_journal_lock_updates (journal_t *); -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html