On Wed, 2025-01-29 at 13:41 +0000, David Howells wrote: > Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote: > > > > Do you want me to push a branch with my tracepoints that I'm using somewhere > > > that you can grab it? > > > > Sounds good! Maybe it can help me. :) > > Take a look at: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/ > > The "ceph-folio" branch has Willy's folio conversion patches plus a tracing > patch plus a patch that's an unsuccessful attempt by me to fix the hang I was > seeing. > > The tracepoint I'm using (netfs_folio) takes a folio pointer, so it was easier > to do it on top of Willy's patches. > > The "netfs-crypto" branch are my patches to implement content crypto in > netfslib. I've tested them to some extent with AFS, but the test code I have > in AFS only supports crypto of files where the file is an exact multiple of > page size as AFS doesn't support any sort of xattr and so I can't store the > real EOF pointer so simply. > > The "ceph-iter" branch are my patches on top of a merge of those two > (excluding the debugging patches) to try and convert ceph to fully using > netfslib and to pass an iterator all the way down to the socket, aiming to > reduce the number of data types to basically two. > Great! Thanks a lot. I believe I have been found all current issues in ceph_writepages_start(). So, I need to clean up the current messy state of the fix and the method itself. Let me make this clean up, test the fix (probably, I could have some issues with the fix yet), and share the patch finally. As far as I can see, there are several issues in ceph_writepages_start(): (1) We have double lock issue (reason of the hang); (2) We have issue with not correct place for folio_wait_writeback(); (3) The ceph_inc_osd_stopping_blocker() could not provide guarantee of waiting finishing all dirty memory pages flush. It's racy now, as far as I can see. But I need to check it more accurately by testing. (4) The folio_batch with found dirty pages by filemap_get_folios_tag() is not processed properly. And this is why some number of dirty pages simply never processed and we still have dirty pages after unmount. (5) The whole method of ceph_writepages_start() is huge and messy for my taste and this is the reason of all of these issues (it's hard to follow the logic of the method in this unreasonable complexity). Thanks, Slava.