Hi, Ethan. Do you have a scenario where this can be recreated? I will take a look at this issue. Here is the tracker issue to follow this. https://tracker.ceph.com/issues/69196 On Tue, Oct 29, 2024 at 11:14 AM tzuchieh wu <ethan198912@xxxxxxxxx> wrote: > > Hi, > Recently I was running into a hung task on kernel client. > After investigating the environment, I think there might be a deadlock > caused by calling iput inside OSD dispatch(ceph-msgr) thread > > My kernel cpeh client version is 5.15 > The call stack for the kernel ceph-msgr thread is: > > [<0>] wait_on_page_bit_common+0x106/0x300 > [<0>] truncate_inode_pages_range+0x381/0x6a0 > [<0>] ceph_evict_inode+0x4a/0x200 [ceph] > [<0>] evict+0xc6/0x190 > [<0>] ceph_put_wrbuffer_cap_refs+0xdf/0x1d0 [ceph] > [<0>] writepages_finish+0x2c4/0x440 [ceph] > [<0>] handle_reply+0x5be/0x6d0 [libceph] > [<0>] dispatch+0x49/0xa60 [libceph] > [<0>] ceph_con_workfn+0x10fa/0x24b0 [libceph] > [<0>] worker_run_work+0xb8/0xd0 > [<0>] process_one_work+0x1d3/0x3c0 > [<0>] worker_thread+0x4d/0x3e0 > [<0>] kthread+0x12d/0x150 > [<0>] ret_from_fork+0x1f/0x30 > > And the following messages are from osdc in ceph debugfs > > 2774296 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x40002c 92 write > ... > 2774578 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x400014 1 read > 2774579 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x400014 1 read > 2774580 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x400014 1 read > 2774581 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x400014 1 read > 2774582 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x400014 1 read > 2774583 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x400014 1 read > 2774584 osd30 4.1ea6f009 4.9 [30,14]/30 [30,14]/30 e18281 > 200006dabea.00000003 0x400014 1 read > ... > > We can see that kernel client has sent both write and multiple read > requests on object 200006dabea.00000003. > The iput_final waits for truncate_inode_pages_range to finish which in > turn waits for page bit. > From the above osdc and taking a look the code, I think > truncate_inode_pages_range might be waiting for readahead request to > finish. > The client cannot handle the readahead request from osd since the > ceph-msgr itself is blocking on handle_reply, however. > > This following patch solved the problem by calling iput at a different thread > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ceph/inode.c?id=3e1d0452edceebb903d23db53201013c940bf000 > but was reverted later because session mutex is no longer held when > calling iput. > > In the comment of the above patch, it also points out: > > truncate_inode_pages_range() waits for readahead pages and > In general, it's not good to call iput_final() inside MDS/OSD dispatch > threads or while holding any mutex. > > Therefore, it looks like calling iput inside OSD dispatch thread is not safe. > Any suggestion on this issue? > > thanks, > ethan >