On Tue, 2017-06-06 at 14:57 -0400, Benjamin Coddington wrote: > On 6 Jun 2017, at 14:25, Jeff Layton wrote: > > > On Tue, 2017-06-06 at 14:00 -0400, Jeff Layton wrote: > > > On Tue, 2017-06-06 at 13:19 -0400, Benjamin Coddington wrote: > > > > Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must > > > > be > > > > atomic with the stateid update", NFSv4 has been inserting locks in > > > > rpciod > > > > worker context. The result is that the file_lock's fl_nspid is the > > > > kworker's pid instead of the original userspace pid. > > > > > > > > The fl_nspid is only used to represent the namespaced virtual pid > > > > number > > > > when displaying locks or returning from F_GETLK. There's no reason > > > > to set > > > > it for every inserted lock, since we can usually just look it up > > > > from > > > > fl_pid. So, instead of looking up and holding struct pid for every > > > > lock, > > > > let's just look up the virtual pid number from fl_pid when it is > > > > needed. > > > > That means we can remove fl_nspid entirely. > > > > > > > > > > With this set, I think we ought to codify that the stored pid must be > > > relative > > > > ...to the init_pid_ns. Let's make that clear in the comments for > > filesystem authors. > > OK, but I think you mean fl_pid should always be current->tgid or the > pid as > it is in init_pid_ns. We translate that pid into the virtual pid of the > process doing F_GETLK or reading /proc/locks. > > > > > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > > > > --- > > > > fs/locks.c | 58 > > > > ++++++++++++++++++++++++++++++++---------------------- > > > > include/linux/fs.h | 1 - > > > > 2 files changed, 35 insertions(+), 24 deletions(-) > > > > > > > > diff --git a/fs/locks.c b/fs/locks.c > > > > index d7daa6c8932f..104398ccc9b9 100644 > > > > --- a/fs/locks.c > > > > +++ b/fs/locks.c > > > > @@ -733,7 +733,6 @@ static void locks_wake_up_blocks(struct > > > > file_lock *blocker) > > > > static void > > > > locks_insert_lock_ctx(struct file_lock *fl, struct list_head > > > > *before) > > > > { > > > > - fl->fl_nspid = get_pid(task_tgid(current)); > > > > list_add_tail(&fl->fl_list, before); > > > > locks_insert_global_locks(fl); > > > > } > > > > @@ -743,10 +742,6 @@ locks_unlink_lock_ctx(struct file_lock *fl) > > > > { > > > > locks_delete_global_locks(fl); > > > > list_del_init(&fl->fl_list); > > > > - if (fl->fl_nspid) { > > > > - put_pid(fl->fl_nspid); > > > > - fl->fl_nspid = NULL; > > > > - } > > > > locks_wake_up_blocks(fl); > > > > } > > > > > > > > @@ -823,8 +818,6 @@ posix_test_lock(struct file *filp, struct > > > > file_lock *fl) > > > > list_for_each_entry(cfl, &ctx->flc_posix, fl_list) { > > > > if (posix_locks_conflict(fl, cfl)) { > > > > locks_copy_conflock(fl, cfl); > > > > - if (cfl->fl_nspid) > > > > - fl->fl_pid = pid_vnr(cfl->fl_nspid); > > > > goto out; > > > > } > > > > } > > > > @@ -2048,6 +2041,31 @@ int vfs_test_lock(struct file *filp, struct > > > > file_lock *fl) > > > > } > > > > EXPORT_SYMBOL_GPL(vfs_test_lock); > > > > > > > > +/** > > > > + * locks_translate_pid - translate a pid number into a namespace > > > > + * @nr: The pid number in the init_pid_ns > > > > + * @ns: The namespace into which the pid should be translated > > > > + * > > > > + * Used to tranlate a fl_pid into a namespace virtual pid number > > > > + */ > > > > +static pid_t locks_translate_pid(int init_nr, struct pid_namespace > > > > *ns) > > > > +{ > > > > + pid_t vnr = 0; > > > > + struct task_struct *task; > > > > + > > > > + rcu_read_lock(); > > > > + task = find_task_by_pid_ns(init_nr, &init_pid_ns); > > > > + if (task) > > > > + get_task_struct(task); > > > > + rcu_read_unlock(); > > > > > > Is that safe? What prevents get_task_struct from doing a 0->1 > > > transition > > > there after the task usage count has already gone 1->0 and is on its > > > way > > > to being freed? > > Uh, no -- seems not safe. I copied that directly from fs/proc/base.c, > and > seems a problem there too. > > Changing this to the below avoids the race with the struct task being > released: > > rcu_read_lock(); > struct pid = find_pid_ns(init_nr, &init_pid_ns) > vnr = pid_vnr(pid); > rcu_read_unlock(); > Actually now that I've looked a little more closely, I think it may be ok. call_rcu callback is what ends up putting the last reference when the task is released, so if you can find the thing via find_task_by_pid_ns, then you know that the refcount is at least 1 until the rcu_read_lock is dropped. Still, I think doing the pid assignment under rcu_read_lock and not bothering with the refcount is better in this case... Thanks, -- Jeff Layton <jlayton@xxxxxxxxxx>