On Tue, 2025-01-28 at 20:01 +0000, David Howells wrote: > I added some tracing to fs/ceph/addr.c and this highlights the bug causing the > hang that I'm seeing. > > So what I see is ceph_writepages_start() being entered and getting a > collection of folios from filemap_get_folios_tag(): > > netfs_ceph_writepages: i=10000004f52 ix=0 > netfs_ceph_wp_get_folios: i=10000004f52 oix=0 ix=8000000000000 nr=6 > > Then we get out the first dirty folio from the batch and attempt to lock it: > > netfs_folio: i=10000004f52 ix=00003-00003 ceph-wb-lock > > which succeeds. We then pass through a number of lines: > > netfs_ceph_wp_track: i=10000004f52 line=1218 > > which is the "/* shift unused page to beginning of fbatch */" comment, then: > > netfs_ceph_wp_track: i=10000004f52 line=1238 > > which is followed by "offset = ceph_fscrypt_page_offset(pages[0]);", then: > > netfs_ceph_wp_track: i=10000004f52 line=1264 > > which is the error handling path of: > > if (!ceph_inc_osd_stopping_blocker(fsc->mdsc)) { > rc = -EIO; > goto release_folios; > } > > and then: > > netfs_ceph_wp_track: i=10000004f52 line=1389 > > which is "release_folios:". > > We then reenter ceph_writepages_start(), get the same batch of dirty folios > and try to lock them again: > > netfs_ceph_writepages: i=10000004f52 ix=0 > netfs_ceph_wp_get_folios: i=10000004f52 oix=0 ix=8000000000000 nr=6 > netfs_folio: i=10000004f52 ix=00003-00003 ceph-wb-lock > > and that's where we hang. > > I think the problem is that the error handling here: > > if (!ceph_inc_osd_stopping_blocker(fsc->mdsc)) { > rc = -EIO; > goto release_folios; > } > > is insufficient. The folios are locked and can't just be released. > > Why ceph_inc_osd_stopping_blocker() fails is also something that needs looking > at. > Yeah, I am trying to solve this issue now. :) I am reproducing the issue for generic/421. It's only the first issue. Also this code [1] doesn't work because page is already locked and it will be unlocked only in writepages_finish(): if (folio_test_writeback(folio) || folio_test_private_2(folio) /* [DEPRECATED] */) { if (wbc->sync_mode == WB_SYNC_NONE) { doutc(cl, "%p under writeback\n", folio); folio_unlock(folio); continue; } doutc(cl, "waiting on writeback %p\n", folio); folio_wait_writeback(folio); folio_wait_private_2(folio); /* [DEPRECATED] */ } It looks like we need to check it before the lock here [2]. And even after solving these two issues, I can see dirty memory pages after unmount finish. Something wrong yet in ceph_writepages_start() logic. So, I am trying to figure out what I am missing here yet. [1] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L1101 [2] https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L1059