Re: [PATCH] ceph: do not truncate pagecache if truncate size doesn't change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/23/21 9:00 AM, Xiubo Li wrote:

On 11/23/21 3:10 AM, Jeff Layton wrote:
[...]
One thing I'm finding today is that this patch reliably makes
generic/445 hang at umount time with -o test_dummy_encryption
enabled...which is a bit strange as the test doesn't actually run:

     [jlayton@client1 xfstests-dev]$ sudo ./tests/generic/445
     QA output created by 445
     445 not run: xfs_io falloc  failed (old kernel/wrong fs?)
     [jlayton@client1 xfstests-dev]$ sudo umount /mnt/test

...and the umount hangs waiting for writeback to complete. When I back
this patch out, the problem goes away. Are you able to reproduce this?

There are no mds or osd calls in flight, and no caps (according to
debugfs). This is using -o test_dummy_encryption to force encryption.

I have hit a same issue without the "test_dummy_encryption", and it got stuck but I didn't see any call to ceph. But not the 445, I couldn't remember which one, I thought it was something wrong with my OS, I just rebooted my VM.

# ps -aux | grep generic

root      564385  0.0  0.0  11804  4700 pts/1    S+   09:41 0:00 /bin/bash ./tests/generic/318

# cat /proc/564385/stack

[<0>] do_wait+0x2cc/0x4e0
[<0>] kernel_wait4+0xec/0x1b0
[<0>] __do_sys_wait4+0xe0/0xf0
[<0>] do_syscall_64+0x37/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

I have hit this again today, I found that the MDS daemon crashed, and when the standby MDSes were replaying the journal log they crashed too.

I think this should be the reason why they stuck. I will check it.

-- Xiubo


I ran the ceph.exlude tests for two days, I just saw this one time.

I have attached the test results, does it the same with yours ? There have many test cases didn't run.

There have 4 failures and for the generic/020 it will be reproducable by 30%. All the other 3 failures are every time, but they all seems not relevant to fscrypt.


I narrowed it down to the call to _require_seek_data_hole. That calls
the seek_sanity_test binary and after that point, umounting the fs
hangs. I've not yet been successful at reproducing this while running
the binary by hand, so there may be some other preliminary ops that are
a factor too.

In any case, this looks like a regression, so I'm going to drop this
patch for now. I'll keep poking at the problem too however.




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux