On Mon 28-09-20 16:41:03, Mauricio Faria de Oliveira wrote: > This implements journal callbacks j_submit|finish_inode_data_buffers() > with different behavior for data=journal: to write-protect pages under > commit, preventing changes to buffers writeably mapped to userspace. > > If a buffer's content changes between commit's checksum calculation > and write-out to disk, it can cause journal recovery/mount failures > upon a kernel crash or power loss. > > [ 27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support! > [ 27.339492] JBD2: Invalid checksum recovering data block 8705 in log > [ 27.342716] JBD2: recovery failed > [ 27.343316] EXT4-fs (loop0): error loading journal > mount: /ext4: can't read superblock on /dev/loop0. > > In j_submit_inode_data_buffers() we write-protect the inode's pages > with write_cache_pages() and redirty w/ writepage callback if needed. > > In j_finish_inode_data_buffers() there is nothing do to. > > And in order to use the callbacks, inodes are added to the inode list > in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite(). > > In ext4_page_mkwrite() we must make sure that the buffers are attached > to the transaction as jbddirty with write_end_fn(), as already done in > __ext4_journalled_writepage(). > > Signed-off-by: Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx> > Reported-by: Dann Frazier <dann.frazier@xxxxxxxxxxxxx> > Reported-by: kernel test robot <lkp@xxxxxxxxx> # wbc.nr_to_write > Suggested-by: Jan Kara <jack@xxxxxxx> The patch looks good to me. Just one nit below. After fixing that feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> > + * However, we have to redirty a page in these cases: > + * 1) some buffer is dirty (needs checkpointing) > + * 2) some buffer is not part of the committing transaction > + * 3) some buffer already has b_next_transaction set > + */ Maybe I'd move this comment inside ext4_journalled_writepage_callback() just before the if () to make it clear what it speaks about. I'd also somewhat expand it like: /* * However, we have to redirty a page in these cases: * 1) If buffer is dirty, it means the page was dirty because it contains a * buffer that needs checkpointing. So dirty bit needs to be preserved so * that checkpointing writes the buffer properly. * 2) If buffer is not part of the committing transaction (we may have just * accidentally come across this buffer because inode range tracking is not * exact) or if the currently running transaction already contains this * buffer as well, dirty bit needs to be preserved so that the buffer gets * properly writeprotected on running transaction's commit. */ > + > +static int ext4_journalled_writepage_callback(struct page *page, > + struct writeback_control *wbc, > + void *data) > +{ > + transaction_t *transaction = (transaction_t *) data; > + struct buffer_head *bh, *head; > + struct journal_head *jh; > + > + bh = head = page_buffers(page); > + do { > + jh = bh2jh(bh); > + if (buffer_dirty(bh) || > + (jh && (jh->b_transaction != transaction || > + jh->b_next_transaction))) { Also we usually indent the condition like: if (buffer_dirty(bh) || (jh && (jh->b_transaction != transaction || jh->b_next_transaction))) { Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR