Re: probable quota bug introduced in 6.6-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 03, 2023 at 08:11:11PM -0400, Eric Whitney wrote:
> When run on my test hardware, generic/270 triggers hung task timeouts when
> run on a 6.6-rc1 (or -rc2, -rc3, -rc4) kernel with kvm-xfstests using the
> nojournal test scenario.  The test always passes, but about 60% of the time
> the running time of the test increases by an order of magnitude or more and
> one or more of the hung task timeout warnings included below can be found in
> the log.
> 
> This does not reproduce on 6.5.  Bisection leads to this patch:
> 
> dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should
> provide")

Can you revert the culprit to see if it helps?

> 
> >From the log:
> 
> generic/270 306s ...  [20:08:45][  311.322318] run fstests generic/270 at 2023-10-03 20:08:45
> [  311.579641] EXT4-fs (vdc): mounted filesystem d0e542a0-3342-4d43-aa1f-c918cc92aafa r/w without journal. Quota mode: writeback.
> [  311.587978] EXT4-fs (vdc): re-mounted d0e542a0-3342-4d43-aa1f-c918cc92aafa ro. Quota mode: writeback.
> [  311.592725] EXT4-fs (vdc): re-mounted d0e542a0-3342-4d43-aa1f-c918cc92aafa r/w. Quota mode: writeback.
> [  335.491107] 270 (3092): drop_caches: 3
> [  491.167988] INFO: task quotaon:3450 blocked for more than 122 seconds.
> [  491.168334]       Not tainted 6.4.0+ #13
> [  491.168544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  491.168936] task:quotaon         state:D stack:0     pid:3450  ppid:3092   flags:0x00004000
> [  491.169363] Call Trace:
> [  491.169503]  <TASK>
> [  491.169620]  __schedule+0x394/0xd40
> [  491.169813]  schedule+0x5d/0xd0
> [  491.169981]  schedule_timeout+0x1a7/0x1c0
> [  491.170191]  ? lock_release+0x139/0x280
> [  491.170395]  ? lock_acquire+0xb9/0x180
> [  491.170605]  ? do_raw_spin_unlock+0x4b/0xa0
> [  491.170837]  __wait_for_common+0xb6/0x1e0
> [  491.171046]  ? __pfx_schedule_timeout+0x10/0x10
> [  491.171324]  __flush_work+0x2da/0x430
> [  491.171517]  ? __pfx_wq_barrier_func+0x10/0x10
> [  491.171747]  ? 0xffffffff81000000
> [  491.171932]  dquot_disable+0x3e5/0x670
> [  491.172134]  ext4_quota_off+0x50/0x1a0
> [  491.172332]  __x64_sys_quotactl+0x87/0x1c0
> [  491.172545]  do_syscall_64+0x38/0x90
> [  491.172731]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  491.172990] RIP: 0033:0x7f3c7c79eada
> [  491.173176] RSP: 002b:00007ffed2ff4478 EFLAGS: 00000246 ORIG_RAX: 00000000000000b3
> [  491.173558] RAX: ffffffffffffffda RBX: 000055886a3997d0 RCX: 00007f3c7c79eada
> [  491.173915] RDX: 0000000000000000 RSI: 000055886bf43de0 RDI: 0000000080000301
> [  491.174271] RBP: 000055886bf43de0 R08: 0000000000000001 R09: 0000000000000002
> [  491.174657] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
> [  491.175014] R13: 000055886bf43ea0 R14: 0000000000000001 R15: 0000000000000000
> [  491.175373]  </TASK>
> [  491.175491] 
> [  491.175491] Showing all locks held in the system:
> [  491.176706] 1 lock held by rcu_tasks_kthre/12:
> [  491.178126]  #0: ffffffff82763970 (rcu_tasks.tasks_gp_mutex){....}-{3:3}, at: rcu_tasks_one_gp+0x30/0x3f0
> [  491.180955] 1 lock held by rcu_tasks_rude_/13:
> [  491.182394]  #0: ffffffff827636f0 (rcu_tasks_rude.tasks_gp_mutex){....}-{3:3}, at: rcu_tasks_one_gp+0x30/0x3f0
> [  491.194388] 1 lock held by khungtaskd/26:
> [  491.196153]  #0: ffffffff82764020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0xe/0x110
> [  491.199676] 2 locks held by kworker/u4:4/59:
> [  491.200722]  #0: ffff88800385cd38 ((wq_completion)events_unbound){....}-{0:0}, at: process_one_work+0x1f6/0x550
> [  491.201600]  #1: ffffc90000513e80 ((quota_release_work).work){....}-{0:0}, at: process_one_work+0x1f6/0x550
> [  491.202746] 1 lock held by quotaon/3450:
> [  491.203184]  #0: ffff88800afd60e0 (&type->s_umount_key#33){....}-{3:3}, at: user_get_super+0xd3/0x100
> [  491.204217] 
> [  491.204373] =============================================
> 

Anyway, thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: dabc8b20756601
#regzbot title: dqput() fix causes kvm-xfstests nojournal test time longer

-- 
An old man doll... just what I always wanted! - Clara

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux