Re: [PATCH] fanotify: to differ file access event from different threads

Mihai Donțu <mihai.dontu@xxxxxxxxx> · Mon, 11 Feb 2013 21:49:52 +0200

On Tue, 6 Dec 2011 09:23:25 +0800 boyd yang wrote:
> fanotify: to differ file access event from different threads
> When fanotify is monitoring the whole mount point "/", and multiple
> threads iterate the same direcotry, some thread will hang.
> This patch let fanotify to differ access events from different
> threads, prevent fanotify from merging access events from different
> threads.
> It also hide overflow events to reach user space.
> Signed-off-by: Boyd Yang <boyd.yang@xxxxxxxxx>
> 
> diff -r -u linux-3.1-rc4_orig/fs/notify/fanotify/fanotify.c
> linux-3.1-rc4/fs/notify/fanotify/fanotify.c
> --- linux-3.1-rc4_orig/fs/notify/fanotify/fanotify.c	2011-08-29
> 12:16:01.000000000 +0800
> +++ linux-3.1-rc4/fs/notify/fanotify/fanotify.c	2011-10-14
> 14:17:53.055958000 +0800
> @@ -15,7 +15,8 @@
> 
>  	if (old->to_tell == new->to_tell &&
>  	    old->data_type == new->data_type &&
> -	    old->tgid == new->tgid) {
> +	    old->tgid == new->tgid &&
> +	    old->pid == new->pid) {
>  		switch (old->data_type) {
>  		case (FSNOTIFY_EVENT_PATH):
>  			if ((old->path.mnt == new->path.mnt) &&
> @@ -144,11 +145,16 @@
>  		return PTR_ERR(notify_event);
> 
>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> -	if (event->mask & FAN_ALL_PERM_EVENTS) {
> -		/* if we merged we need to wait on the new event */
> -		if (notify_event)
> -			event = notify_event;
> -		ret = fanotify_get_response_from_access(group,
> event);
> +	/*if overflow, do not wait for response*/
> +	if (event->mask&FS_Q_OVERFLOW) {
> +		pr_debug("fanotify overflow!\n");
> +	}	else {
> +		if (event->mask & FAN_ALL_PERM_EVENTS) {
> +			/* if we merged we need to wait on the new
> event */
> +			if (notify_event)
> +				event = notify_event;
> +			ret =
> fanotify_get_response_from_access(group, event);
> +		}
>  	}
>  #endif
> 
> diff -r -u linux-3.1-rc4_orig/fs/notify/notification.c
> linux-3.1-rc4/fs/notify/notification.c
> --- linux-3.1-rc4_orig/fs/notify/notification.c	2011-08-29
> 12:16:01.000000000 +0800
> +++ linux-3.1-rc4/fs/notify/notification.c	2011-10-14
> 13:52:36.946608000 +0800 @@ -95,6 +95,7 @@
>  		BUG_ON(!list_empty(&event->private_data_list));
> 
>  		kfree(event->file_name);
> +		put_pid(event->pid);
>  		put_pid(event->tgid);
>  		kmem_cache_free(fsnotify_event_cachep, event);
>  	}
> @@ -374,6 +375,7 @@
>  			return NULL;
>  		}
>  	}
> +	event->pid = get_pid(old_event->pid);
>  	event->tgid = get_pid(old_event->tgid);
>  	if (event->data_type == FSNOTIFY_EVENT_PATH)
>  		path_get(&event->path);
> @@ -417,6 +419,7 @@
>  		event->name_len = strlen(event->file_name);
>  	}
> 
> +	event->pid = get_pid(task_pid(current));
>  	event->tgid = get_pid(task_tgid(current));
>  	event->sync_cookie = cookie;
>  	event->to_tell = to_tell;
> diff -r -u linux-3.1-rc4_orig/include/linux/fsnotify_backend.h
> linux-3.1-rc4/include/linux/fsnotify_backend.h
> --- linux-3.1-rc4_orig/include/linux/fsnotify_backend.h
> 2011-08-29 12:16:01.000000000 +0800
> +++ linux-3.1-rc4/include/linux/fsnotify_backend.h	2011-10-14
> 13:51:50.380168000 +0800
> @@ -238,6 +238,7 @@
>  	u32 sync_cookie;	/* used to corrolate events, namely
> inotify mv events */ const unsigned char *file_name;
>  	size_t name_len;
> +	struct pid *pid;
>  	struct pid *tgid;
> 
>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> 

This patch triggers the following on my 3.7.6 kernel:

INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0,
t=15002 jiffies) sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU 0
Modules linked in: ext2 ppdev parport_pc mac_hid psmouse serio_raw
i2c_piix4 lp parport 8139too floppy 8139cp

Pid: 0, comm: swapper/0 Not tainted 3.2.35 #12 Bochs Bochs
RIP: 0010:[<ffffffff81037bdf>]  [<ffffffff81037bdf>]
flat_send_IPI_all+0xaf/0xd0 RSP: 0018:ffff88003fc03d88  EFLAGS: 00010006
RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000050
RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000300
RBP: ffff88003fc03da8 R08: 000000000000000a R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000c00
R13: 0000000003000000 R14: 0000000000000001 R15: ffffffff81c32d00
FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fb5d3100000 CR3: 000000001ca06000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task
ffffffff81c0d020) Stack:
 0000000000000000 0000000000002710 ffffffff81c31c00 ffffffff81c31d00
 ffff88003fc03dc8 ffffffff81033231 000000000000000a ffff88003fc0ec40
 ffff88003fc03e18 ffffffff810defbe ffff880000000001 ffffffff81c32d00
Call Trace:
 <IRQ>
 [<ffffffff81033231>] arch_trigger_all_cpu_backtrace+0x61/0xa0
 [<ffffffff810defbe>] __rcu_pending+0x3ae/0x420
 [<ffffffff810df329>] rcu_check_callbacks+0x79/0x1e0
 [<ffffffff81078068>] update_process_times+0x48/0x90
 [<ffffffff8109b4f4>] tick_sched_timer+0x64/0xc0
 [<ffffffff8108df56>] __run_hrtimer+0x76/0x1f0
 [<ffffffff8109b490>] ? tick_nohz_handler+0x100/0x100
 [<ffffffff8108e907>] hrtimer_interrupt+0xf7/0x230
 [<ffffffff81650409>] smp_apic_timer_interrupt+0x69/0x99
 [<ffffffff8164e2de>] apic_timer_interrupt+0x6e/0x80
 <EOI>
 [<ffffffff810904a5>] ? sched_clock_local+0x25/0x90
 [<ffffffff8103cedb>] ? native_safe_halt+0xb/0x10
 [<ffffffff8101c6b3>] default_idle+0x53/0x1d0
 [<ffffffff81013236>] cpu_idle+0xd6/0x120
 [<ffffffff816172ce>] rest_init+0x72/0x74
 [<ffffffff81cfcba5>] start_kernel+0x3b0/0x3bd
 [<ffffffff81cfc347>] x86_64_start_reservations+0x132/0x136
 [<ffffffff81cfc140>] ? early_idt_handlers+0x140/0x140
 [<ffffffff81cfc44d>] x86_64_start_kernel+0x102/0x111
[...]

It happens after my application runs for half an hour or so. However, I
don't see how this could possibly solve the problem I've observed: due
to a race, a kernel thread ends up doing wait_event() on an event which
soon after is merged by a different thread into a new one which becomes
the actual event to be "received" by the content introspection
application. It's easily reproducible with a simple script:

   $ while true; do cp -f /root/eicar.com /root/watched-dir; done

all the while the fanotify application does a re-open (RD -> RDWR) and
truncate(0), on multiple threads.

(I do a fanotify_init(O_RDONLY) because of surprise ETXTBSY)

Anyway, regardless of how I use the API the race needs to be
eliminated somehow. So my problem now is: how do I switch all
wait_event()-users to the new event created by fanotify_merge()?

-- 
Mihai Donțu
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html