On (24/11/14 11:13), Joanne Koong wrote: [..] > +static void fuse_init_fc_timeout(struct fuse_conn *fc, struct fuse_fs_context *ctx) > +{ > + if (ctx->req_timeout) { > + if (check_mul_overflow(ctx->req_timeout * 60, HZ, &fc->timeout.req_timeout)) > + fc->timeout.req_timeout = ULONG_MAX; > + timer_setup(&fc->timeout.timer, fuse_check_timeout, 0); > + mod_timer(&fc->timeout.timer, jiffies + FUSE_TIMEOUT_TIMER_FREQ); So I think this will require much bigger changes in the code. fuse_check_timeout() is executed from IRQ context and it takes the same locks that are acquired by preemptive task contexts. So all of those now need to disable local IRQs before they acquire: fc->bg_lock, fig->lock, fc->lock. Otherwise we are in a deadlock scenario (in many places, unfortunately). A simple example: [ 91.466408] _raw_spin_lock+0x39/0x70 [ 91.466420] fuse_simple_background+0x902/0x1130 [fuse] [ 91.466453] fuse_send_init+0x337/0x680 [fuse] [ 91.466485] fuse_fill_super+0x1c8/0x200 [fuse] [ 91.466516] get_tree_nodev+0xaf/0x140 [ 91.466527] fuse_get_tree+0x27e/0x450 [fuse] [ 91.466559] vfs_get_tree+0x88/0x240 [ 91.466569] path_mount+0xf26/0x1ed0 [ 91.466579] __se_sys_mount+0x1c9/0x240 [ 91.466588] do_syscall_64+0x6f/0xa0 [ 91.466598] entry_SYSCALL_64_after_hwframe+0x73/0xdd [ 91.466666] other info that might help us debug this: [ 91.466672] Possible unsafe locking scenario: [ 91.466679] CPU0 [ 91.466684] ---- [ 91.466689] lock(&fiq->lock); [ 91.466701] <Interrupt> [ 91.466707] lock(&fiq->lock); [ 91.466718] *** DEADLOCK *** [ 91.466724] 4 locks held by runtime_probe/5043: [ 91.466732] #0: ffff888005812448 (sb_writers#3){.+.+}-{0:0}, at: filename_create+0xb2/0x320 [ 91.466762] #1: ffff8881499ea3f0 (&type->i_mutex_dir_key#5/1){+.+.}-{3:3}, at: filename_create+0x14c/0x320 [ 91.466791] #2: ffffffff9d864ce0 (rcu_read_lock){....}-{1:2}, at: security_sid_to_context_core+0xa4/0x4f0 [ 91.466817] #3: ffff88815c009ec0 ((&fc->timeout.timer)){+.-.}-{0:0}, at: run_timer_softirq+0x702/0x1700 [ 91.466841] stack backtrace: [ 91.466850] CPU: 2 PID: 5043 Comm: runtime_probe Tainted: G U 6.6.63-lockdep #1 f2b045305e587e03c4766ca818bb77742f953c87 [ 91.466864] Hardware name: HP Lantis/Lantis, BIOS Google_Lantis.13606.437.0 01/21/2022 [ 91.466872] Call Trace: [ 91.466881] <IRQ> [ 91.466889] dump_stack_lvl+0x6d/0xb0 [ 91.466904] print_usage_bug+0x8a4/0xbb0 [ 91.466917] mark_lock+0x13ca/0x1940 [ 91.466930] __lock_acquire+0xc10/0x2670 [ 91.466944] lock_acquire+0x119/0x3a0 [ 91.466955] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5] [ 91.466992] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5] [ 91.467025] _raw_spin_lock+0x39/0x70 [ 91.467036] ? fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5] [ 91.467070] fuse_check_timeout+0x32/0x630 [fuse c290dfa1c114772028056af34b9662cba2d155b5] [ 91.467104] ? run_timer_softirq+0x702/0x1700 [ 91.467114] ? run_timer_softirq+0x702/0x1700 [ 91.467123] ? __pfx_fuse_check_timeout+0x10/0x10 [fuse c290dfa1c114772028056af34b9662cba2d155b5] [ 91.467156] run_timer_softirq+0x763/0x1700 [ 91.467172] irq_exit_rcu+0x300/0x7d0 [ 91.467183] ? sysvec_apic_timer_interrupt+0x56/0x90 [ 91.467195] sysvec_apic_timer_interrupt+0x56/0x90 [ 91.467205] </IRQ> Do we want to run fuse-watchdog as a preemptive task instead of IRQ context? Simlar to the way the hung-task watchdog runs, for example. Yes, it now may starve and not get scheduled in corner cases (under some crazy pressure), but at least we don't need to patch the entire fuse code to use irq_safe/irq_restore locking variants.