Re: [PATCH v2] inotify: Extend ioctl to allow to request id of new watch descriptor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat 10-02-18 01:45:16, Kirill Tkhai wrote:
> On 09.02.2018 23:56, Andrew Morton wrote:
> > On Fri, 9 Feb 2018 18:04:54 +0300 Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:
> > 
> >> Watch descriptor is id of the watch created by inotify_add_watch().
> >> It is allocated in inotify_add_to_idr(), and takes the numbers
> >> starting from 1. Every new inotify watch obtains next available
> >> number (usually, old + 1), as served by idr_alloc_cyclic().
> >>
> >> CRIU (Checkpoint/Restore In Userspace) project supports inotify
> >> files, and restores watched descriptors with the same numbers,
> >> they had before dump. Since there was no kernel support, we
> >> had to use cycle to add a watch with specific descriptor id:
> >>
> >> 	while (1) {
> >> 		int wd;
> >>
> >> 		wd = inotify_add_watch(inotify_fd, path, mask);
> >> 		if (wd < 0) {
> >> 			break;
> >> 		} else if (wd == desired_wd_id) {
> >> 			ret = 0;
> >> 			break;
> >> 		}
> >>
> >> 		inotify_rm_watch(inotify_fd, wd);
> >> 	}
> >>
> >> (You may find the actual code at the below link:
> >>  https://github.com/checkpoint-restore/criu/blob/v3.7/criu/fsnotify.c#L577)
> >>
> >> The cycle is suboptiomal and very expensive, but since there is no better
> >> kernel support, it was the only way to restore that. Happily, we had met
> >> mostly descriptors with small id, and this approach had worked somehow.
> >>
> >> But recent time containers with inotify with big watch descriptors
> >> begun to come, and this way stopped to work at all. When descriptor id
> >> is something about 0x34d71d6, the restoring process spins in busy loop
> >> for a long time, and the restore hungs and delay of migration from node
> >> to node could easily be watched.
> >>
> >> This patch aims to solve this problem. It introduces new ioctl
> >> INOTIFY_IOC_SETNEXTWD, which allows to request the number of next created
> >> watch descriptor from userspace. It simply calls idr_set_cursor() primitive
> >> to populate idr::idr_next, so that next idr_alloc_cyclic() allocation
> >> will return this id, if it is not occupied. This is the way which is
> >> used to restore some other resources from userspace. For example,
> >> /proc/sys/kernel/ns_last_pid works the same for task pids.
> >>
> >> The new code is under CONFIG_CHECKPOINT_RESTORE #define, so small system
> >> may exclude it.
> >>
> > 
> > Reviewed-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > 
> > With a little cleanup:
> > 
> > --- a/fs/notify/inotify/inotify_user.c~inotify-extend-ioctl-to-allow-to-request-id-of-new-watch-descriptor-fix
> > +++ a/fs/notify/inotify/inotify_user.c
> > @@ -285,7 +285,6 @@ static int inotify_release(struct inode
> >  static long inotify_ioctl(struct file *file, unsigned int cmd,
> >  			  unsigned long arg)
> >  {
> > -	struct inotify_group_private_data *data __maybe_unused;
> >  	struct fsnotify_group *group;
> >  	struct fsnotify_event *fsn_event;
> >  	void __user *p;
> > @@ -294,7 +293,6 @@ static long inotify_ioctl(struct file *f
> >  
> >  	group = file->private_data;
> >  	p = (void __user *) arg;
> > -	data = &group->inotify_data;
> >  
> >  	pr_debug("%s: group=%p cmd=%u\n", __func__, group, cmd);
> >  
> > @@ -313,6 +311,9 @@ static long inotify_ioctl(struct file *f
> >  	case INOTIFY_IOC_SETNEXTWD:
> >  		ret = -EINVAL;
> >  		if (arg >= 1 && arg <= INT_MAX) {
> > +			struct inotify_group_private_data *data;
> > +
> > +			data = &group->inotify_data;
> >  			spin_lock(&data->idr_lock);
> >  			idr_set_cursor(&data->idr, (unsigned int)arg);
> >  			spin_unlock(&data->idr_lock);
> 
> I have no objections.

Thanks guys, I have added the patch (with cleanup included) to my tree.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux