On Wed, Feb 14, 2024 at 08:01:10PM +0900, Ryusuke Konishi wrote: > commit 38296afe3c6ee07319e01bb249aa4bb47c07b534 upstream. > > Syzbot reported a hang issue in migrate_pages_batch() called by mbind() > and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2. > > While migrate_pages_batch() locks a folio and waits for the writeback to > complete, the log writer thread that should bring the writeback to > completion picks up the folio being written back in > nilfs_lookup_dirty_data_buffers() that it calls for subsequent log > creation and was trying to lock the folio. Thus causing a deadlock. > > In the first place, it is unexpected that folios/pages in the middle of > writeback will be updated and become dirty. Nilfs2 adds a checksum to > verify the validity of the log being written and uses it for recovery at > mount, so data changes during writeback are suppressed. Since this is > broken, an unclean shutdown could potentially cause recovery to fail. > > Investigation revealed that the root cause is that the wait for writeback > completion in nilfs_page_mkwrite() is conditional, and if the backing > device does not require stable writes, data may be modified without > waiting. > > Fix these issues by making nilfs_page_mkwrite() wait for writeback to > finish regardless of the stable write requirement of the backing device. > > Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@xxxxxxxxx > Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it") > Signed-off-by: Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx> > Reported-by: syzbot+ee2ae68da3b22d04cd8d@xxxxxxxxxxxxxxxxxxxxxxxxx > Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@xxxxxxxxxx > Tested-by: Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > Please apply this patch to the stable trees indicated by the subject line > prefix. > > These versions do not yet have page-to-folio conversion applied to the > target function, so page-based "wait_on_page_writeback()" is used instead > of "folio_wait_writeback()" in this patch. This did not apply as-is to > v6.5 and earlier versions due to an fs-wide change. So I would like to > post a separate patch for earlier stable trees. All now queued up, thanks! greg k-h