On Tue, 6 Dec 2011 09:23:25 +0800 boyd yang wrote: > fanotify: to differ file access event from different threads > When fanotify is monitoring the whole mount point "/", and multiple > threads iterate the same direcotry, some thread will hang. > This patch let fanotify to differ access events from different > threads, prevent fanotify from merging access events from different > threads. > It also hide overflow events to reach user space. > Signed-off-by: Boyd Yang <boyd.yang@xxxxxxxxx> > > diff -r -u linux-3.1-rc4_orig/fs/notify/fanotify/fanotify.c > linux-3.1-rc4/fs/notify/fanotify/fanotify.c > --- linux-3.1-rc4_orig/fs/notify/fanotify/fanotify.c 2011-08-29 > 12:16:01.000000000 +0800 > +++ linux-3.1-rc4/fs/notify/fanotify/fanotify.c 2011-10-14 > 14:17:53.055958000 +0800 > @@ -15,7 +15,8 @@ > > if (old->to_tell == new->to_tell && > old->data_type == new->data_type && > - old->tgid == new->tgid) { > + old->tgid == new->tgid && > + old->pid == new->pid) { > switch (old->data_type) { > case (FSNOTIFY_EVENT_PATH): > if ((old->path.mnt == new->path.mnt) && > @@ -144,11 +145,16 @@ > return PTR_ERR(notify_event); > > #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS > - if (event->mask & FAN_ALL_PERM_EVENTS) { > - /* if we merged we need to wait on the new event */ > - if (notify_event) > - event = notify_event; > - ret = fanotify_get_response_from_access(group, > event); > + /*if overflow, do not wait for response*/ > + if (event->mask&FS_Q_OVERFLOW) { > + pr_debug("fanotify overflow!\n"); > + } else { > + if (event->mask & FAN_ALL_PERM_EVENTS) { > + /* if we merged we need to wait on the new > event */ > + if (notify_event) > + event = notify_event; > + ret = > fanotify_get_response_from_access(group, event); > + } > } > #endif > > diff -r -u linux-3.1-rc4_orig/fs/notify/notification.c > linux-3.1-rc4/fs/notify/notification.c > --- linux-3.1-rc4_orig/fs/notify/notification.c 2011-08-29 > 12:16:01.000000000 +0800 > +++ linux-3.1-rc4/fs/notify/notification.c 2011-10-14 > 13:52:36.946608000 +0800 @@ -95,6 +95,7 @@ > BUG_ON(!list_empty(&event->private_data_list)); > > kfree(event->file_name); > + put_pid(event->pid); > put_pid(event->tgid); > kmem_cache_free(fsnotify_event_cachep, event); > } > @@ -374,6 +375,7 @@ > return NULL; > } > } > + event->pid = get_pid(old_event->pid); > event->tgid = get_pid(old_event->tgid); > if (event->data_type == FSNOTIFY_EVENT_PATH) > path_get(&event->path); > @@ -417,6 +419,7 @@ > event->name_len = strlen(event->file_name); > } > > + event->pid = get_pid(task_pid(current)); > event->tgid = get_pid(task_tgid(current)); > event->sync_cookie = cookie; > event->to_tell = to_tell; > diff -r -u linux-3.1-rc4_orig/include/linux/fsnotify_backend.h > linux-3.1-rc4/include/linux/fsnotify_backend.h > --- linux-3.1-rc4_orig/include/linux/fsnotify_backend.h > 2011-08-29 12:16:01.000000000 +0800 > +++ linux-3.1-rc4/include/linux/fsnotify_backend.h 2011-10-14 > 13:51:50.380168000 +0800 > @@ -238,6 +238,7 @@ > u32 sync_cookie; /* used to corrolate events, namely > inotify mv events */ const unsigned char *file_name; > size_t name_len; > + struct pid *pid; > struct pid *tgid; > > #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS > This patch triggers the following on my 3.7.6 kernel: INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=15002 jiffies) sending NMI to all CPUs: NMI backtrace for cpu 0 CPU 0 Modules linked in: ext2 ppdev parport_pc mac_hid psmouse serio_raw i2c_piix4 lp parport 8139too floppy 8139cp Pid: 0, comm: swapper/0 Not tainted 3.2.35 #12 Bochs Bochs RIP: 0010:[<ffffffff81037bdf>] [<ffffffff81037bdf>] flat_send_IPI_all+0xaf/0xd0 RSP: 0018:ffff88003fc03d88 EFLAGS: 00010006 RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000050 RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000300 RBP: ffff88003fc03da8 R08: 000000000000000a R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000c00 R13: 0000000003000000 R14: 0000000000000001 R15: ffffffff81c32d00 FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fb5d3100000 CR3: 000000001ca06000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0d020) Stack: 0000000000000000 0000000000002710 ffffffff81c31c00 ffffffff81c31d00 ffff88003fc03dc8 ffffffff81033231 000000000000000a ffff88003fc0ec40 ffff88003fc03e18 ffffffff810defbe ffff880000000001 ffffffff81c32d00 Call Trace: <IRQ> [<ffffffff81033231>] arch_trigger_all_cpu_backtrace+0x61/0xa0 [<ffffffff810defbe>] __rcu_pending+0x3ae/0x420 [<ffffffff810df329>] rcu_check_callbacks+0x79/0x1e0 [<ffffffff81078068>] update_process_times+0x48/0x90 [<ffffffff8109b4f4>] tick_sched_timer+0x64/0xc0 [<ffffffff8108df56>] __run_hrtimer+0x76/0x1f0 [<ffffffff8109b490>] ? tick_nohz_handler+0x100/0x100 [<ffffffff8108e907>] hrtimer_interrupt+0xf7/0x230 [<ffffffff81650409>] smp_apic_timer_interrupt+0x69/0x99 [<ffffffff8164e2de>] apic_timer_interrupt+0x6e/0x80 <EOI> [<ffffffff810904a5>] ? sched_clock_local+0x25/0x90 [<ffffffff8103cedb>] ? native_safe_halt+0xb/0x10 [<ffffffff8101c6b3>] default_idle+0x53/0x1d0 [<ffffffff81013236>] cpu_idle+0xd6/0x120 [<ffffffff816172ce>] rest_init+0x72/0x74 [<ffffffff81cfcba5>] start_kernel+0x3b0/0x3bd [<ffffffff81cfc347>] x86_64_start_reservations+0x132/0x136 [<ffffffff81cfc140>] ? early_idt_handlers+0x140/0x140 [<ffffffff81cfc44d>] x86_64_start_kernel+0x102/0x111 [...] It happens after my application runs for half an hour or so. However, I don't see how this could possibly solve the problem I've observed: due to a race, a kernel thread ends up doing wait_event() on an event which soon after is merged by a different thread into a new one which becomes the actual event to be "received" by the content introspection application. It's easily reproducible with a simple script: $ while true; do cp -f /root/eicar.com /root/watched-dir; done all the while the fanotify application does a re-open (RD -> RDWR) and truncate(0), on multiple threads. (I do a fanotify_init(O_RDONLY) because of surprise ETXTBSY) Anyway, regardless of how I use the API the race needs to be eliminated somehow. So my problem now is: how do I switch all wait_event()-users to the new event created by fanotify_merge()? -- Mihai Donțu -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html