Thanks for detailed review Serge. Sorry was unexpectedly off most of this week. Have some responses below will review/address other comments over the next few days. Serge E. Hallyn [serue@xxxxxxxxxx] wrote: | Quoting sukadev@xxxxxxxxxx (sukadev@xxxxxxxxxx): | > | > >From 4567a37856205a04cc0617e3fcc8ede36b25bcf5 Mon Sep 17 00:00:00 2001 | > From: Sukadev Bhattiprolu <sukadev@xxxxxxxxxx> | > Date: Tue, 9 Sep 2008 18:52:56 -0700 | > Subject: [PATCH 09/10] Enable multiple instances of devpts | > | > To support containers, allow multiple instances of devpts filesystem, such | > that indices of ptys allocated in one instance are independent of ptys | > allocated in other instances of devpts. | > | > But to preserve backward compatibility, enable this support for multiple | > instances only if: | > | > - CONFIG_DEVPTS_MULTIPLE_INSTANCES is set to Y, and | > - '-o newinstance' mount option is specified while mounting devpts | > | > See Documentation/fs/devpts.txt (next patch in series) for details. | > | > To use multi-instance mount, a container startup script could: | > | > $ ns_exec -cm /bin/bash | > $ umount /dev/pts | > $ mount -t devpts -o newinstance lxcpts /dev/pts | > $ mount -o bind /dev/pts/ptmx /dev/ptmx | > $ sshd -p 1234 | > | > where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag and execs | > /bin/bash in the child process. A pty created by the sshd is not visible in | > the original mount of /dev/pts. | > | > USER-SPACE-IMPACT: | > | > See Documentation/fs/devpts.txt (included in next patch) for user-space | > impact in multi-instance and mixed-mode operation. | > TODO: | > - Update mount(8), pts(4) man pages. Highlight impact of not | > redirecting /dev/ptmx to /dev/pts/ptmx after a multi-instance mount. | > | > Implementation note: | > | > See comments in new get_sb_ref() function in fs/super.c on why | > get_sb_single() cannot be directly used. | > | > Changelog[v5]: | > - Move get_sb_ref() definition to earlier patch | > | > - Move usage info to Documentation/filesystems/devpts.txt (next patch) | > | > - Make ptmx node even in init_pts_ns, now that default mode is 0000 | > (defined in earlier patch, enabled here). | > | > - Cache ptmx dentry and use to update mode during remount | > (defined in earlier patch, enabled here). | > | > - Bugfix: explicitly ignore newinstance on remount (if newinstance was | > specified on remount of initial mount, it would be ignored but | > /proc/mounts would imply that the option was set) | > | > Changelog[v4]: | > | > - Update patch description to address H. Peter Anvin's comments | > | > - Consolidate multi-instance mode code under new config token, | > CONFIG_DEVPTS_MULTIPLE_INSTANCE. | > | > - Move usage-details from patch description to | > Documentation/fs/devpts.txt | > | > Changelog[v3]: | > - Rename new mount option to 'newinstance' | > | > - Create ptmx nodes only in 'newinstance' mounts | > | > - Bugfix: parse_mount_options() modifies @data but since we need to | > parse the @data twice (once in devpts_get_sb() and once during | > do_remount_sb()), parse a local copy of @data in devpts_get_sb(). | > (restructured code in devpts_get_sb() to fix this) | > | > Changelog[v2]: | > - Support both single-mount and multiple-mount semantics and | > provide '-onewmnt' option to select the semantics. | > | > Signed-off-by: Sukadev Bhattiprolu <sukadev@xxxxxxxxxx> | > --- | > fs/devpts/inode.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++++++-- | > 1 files changed, 163 insertions(+), 5 deletions(-) | > | > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c | > index 6b56255..c54b010 100644 | > --- a/fs/devpts/inode.c | > +++ b/fs/devpts/inode.c | > @@ -48,10 +48,11 @@ struct pts_mount_opts { | > gid_t gid; | > umode_t mode; | > umode_t ptmxmode; | > + int newinstance; | > }; | > | > enum { | > - Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode, | > + Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode, Opt_newinstance, | > Opt_err | > }; | > | > @@ -61,6 +62,7 @@ static match_table_t tokens = { | > {Opt_mode, "mode=%o"}, | > #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES | > {Opt_ptmxmode, "ptmxmode=%o"}, | > + {Opt_newinstance, "newinstance"}, | > #endif | > {Opt_err, NULL} | > }; | > @@ -78,13 +80,15 @@ static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb) | > | > static inline struct super_block *pts_sb_from_inode(struct inode *inode) | > { | > +#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES | > if (inode->i_sb->s_magic == DEVPTS_SUPER_MAGIC) | > return inode->i_sb; | > - | > +#endif | > return devpts_mnt->mnt_sb; | > } | > | > -static int parse_mount_options(char *data, struct pts_mount_opts *opts) | > +static int parse_mount_options(char *data, int remount, | > + struct pts_mount_opts *opts) | > { | > char *p; | > | > @@ -95,6 +99,10 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts) | > opts->mode = DEVPTS_DEFAULT_MODE; | > opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE; | | So does this mean that if I do a remount, the mode and ptmxmode will get | reset to the defaults unless I specify them again? Yes. I think I pointed out this behavior earlier but don't think I received any comments and stuck to current behavior (which for instance resets the mode to default). Besides, this behavior may help in the transition - initial kernel mount would set ptmx-mode to 0 (default) and admin can update /etc/fstab to set ptmx mode meaningfully. | | > | > + /* ignore newinstance on remount to avoid confusing show_options */ | > + if (!remount) | > + opts->newinstance = 0; | > + | > while ((p = strsep(&data, ",")) != NULL) { | > substring_t args[MAX_OPT_ARGS]; | > int token; | > @@ -128,6 +136,10 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts) | > return -EINVAL; | > opts->ptmxmode = option & S_IALLUGO; | > break; | > + case Opt_newinstance: | > + if (!remount) | > + opts->newinstance = 1; | > + break; | > #endif | > default: | > printk(KERN_ERR "devpts: called with bogus options\n"); | > @@ -180,6 +192,8 @@ static int mknod_ptmx(struct super_block *sb) | > | > d_add(dentry, inode); | > | > + fsi->ptmx_dentry = dentry; | > + | > printk(KERN_DEBUG "Created ptmx node in devpts ino %lu\n", | > inode->i_ino); | > | > @@ -207,7 +221,7 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data) | > struct pts_fs_info *fsi = DEVPTS_SB(sb); | > struct pts_mount_opts *opts = &fsi->mount_opts; | > | > - err = parse_mount_options(data, opts); | > + err = parse_mount_options(data, 1, opts); | | A guess a #define rather than '1' would be better here. Ok. | | > | > /* | > * parse_mount_options() restores options to default values | > @@ -232,6 +246,8 @@ static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs) | > seq_printf(seq, ",mode=%03o", opts->mode); | > #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES | > seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode); | > + if (opts->newinstance) | > + seq_printf(seq, ",newinstance"); | | Is actually that something we want to show? It doesn't seem | informative. Without this users have no easy way of knowing whether they have a private mount specially if they mounted from command line ? Does it mislead or confuse ? | | > #endif | > | > return 0; | > @@ -298,12 +314,153 @@ fail: | > return -ENOMEM; | > } | > | > +#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES | > +/* | > + * Safely parse the mount options in @data and update @opts. | > + * | > + * devpts ends up parsing options two times during mount, due to the | > + * two modes of operation it supports. The first parse occurs in | > + * devpts_get_sb() when determining the mode (single-instance or | > + * multi-instance mode). The second parse happens in devpts_remount() | > + * or new_pts_mount() depending on the mode. | > + * | > + * Parsing of options modifies the @data making subsequent parsing | > + * incorrect. So make a local copy of @data and parse it. | > + * | > + * Return: 0 On success, -errno on error | > + */ | > +static int safe_parse_mount_options(void *data, struct pts_mount_opts *opts) | > +{ | > + int rc; | > + void *datacp; | > + | > + if (!data) | > + return 0; | > + | > + /* Use kstrdup() ? */ | > + datacp = kmalloc(PAGE_SIZE, GFP_KERNEL); | > + if (!datacp) | > + return -ENOMEM; | > + | > + memcpy(datacp, data, PAGE_SIZE); | > + rc = parse_mount_options((char *)datacp, 0, opts); | > + kfree(datacp); | > + | > + return rc; | > +} | > + | > +/* | > + * Mount a new (private) instance of devpts. PTYs created in this | > + * instance are independent of the PTYs in other devpts instances. | > + */ | > +static int new_pts_mount(struct file_system_type *fs_type, int flags, | > + void *data, struct vfsmount *mnt) | > +{ | > + int err; | > + struct pts_fs_info *fsi; | > + struct pts_mount_opts *opts; | > + | > + printk(KERN_NOTICE "devpts: newinstance mount\n"); | > + | > + err = get_sb_nodev(fs_type, flags, data, devpts_fill_super, mnt); | > + if (err) | > + return err; | > + | > + fsi = DEVPTS_SB(mnt->mnt_sb); | > + opts = &fsi->mount_opts; | > + | > + err = parse_mount_options(data, 0, opts); | > + if (err) | > + goto fail; | > + | > + err = mknod_ptmx(mnt->mnt_sb); | > + if (err) | > + goto fail; | > + | > + return 0; | > + | > +fail: | > + dput(mnt->mnt_sb->s_root); | > + deactivate_super(mnt->mnt_sb); | > + return err; | > +} | > + | > +/* | > + * Check if 'newinstance' mount option was specified in @data. | > + * | > + * Return: -errno on error (eg: invalid mount options specified) | > + * : 1 if 'newinstance' mount option was specified | > + * : 0 if 'newinstance' mount option was NOT specified | > + */ | > +static int is_new_instance_mount(void *data) | > +{ | > + int rc; | > + struct pts_mount_opts opts; | > + | > + if (!data) | > + return 0; | > + | > + rc = safe_parse_mount_options(data, &opts); | > + if (!rc) | > + rc = opts.newinstance; | > + | > + return rc; | > +} | > + | > +/* | > + * Mount or remount the initial kernel mount of devpts. This type of | > + * mount maintains the legacy, single-instance semantics, while the | > + * kernel still allows multiple-instances. | > + */ | > +static int init_pts_mount(struct file_system_type *fs_type, int flags, | > + void *data, struct vfsmount *mnt) | > +{ | > + int err; | > + | > + if (!devpts_mnt) { | > + err = get_sb_single(fs_type, flags, data, devpts_fill_super, | > + mnt); | > + | > + err = mknod_ptmx(mnt->mnt_sb); | > + if (err) { | > + dput(mnt->mnt_sb->s_root); | > + deactivate_super(mnt->mnt_sb); | > + } else | > + devpts_mnt = mnt; | > + | > + return err; | | There is no locking here, so in early-userspace two competing processes | could both try to set devpts_mnt, right? Hmm. I was thinking there would be only one thread calling the vfs_kern_mount() in init_devpts_fs. | | > + } | > + | > + return get_sb_ref(devpts_mnt->mnt_sb, flags, data, mnt); | > +} | > + | > static int devpts_get_sb(struct file_system_type *fs_type, | > int flags, const char *dev_name, void *data, struct vfsmount *mnt) | > { | > + int new; | > + | > + new = is_new_instance_mount(data); | > + if (new < 0) | > + return new; | > + | > + if (new) | > + return new_pts_mount(fs_type, flags, data, mnt); | > + | > + return init_pts_mount(fs_type, flags, data, mnt); | | Wait a sec - so if a container does | | mount -t devpts -o newinstance none /dev/pts | and then later on just does | mount -t devpts none /dev/pts | | it'll get the init_pts_ns, not the one it had created? Yes. Should we treat the latter as remount of the private instance ? If so, user could add '-oremount' ? The logic seems simple: With newinstance create a private namespace. Without newinstance, bind to initial ns. | | Yup, just confirmed, using: | | ns_exec -cmiup /bin/sh | mount -t devpts -o newinstance -n none /dev/pts | mount -t devpts -n none /mnt | ls /dev/pts | ptmx | ls /mnt | 0 ptmx | That's weird. | > +} | > +#else | > +/* | > + * This supports only the legacy single-instance semantics (no | > + * multiple-instance semantics) | > + */ | > +static int devpts_get_sb(struct file_system_type *fs_type, int flags, | > + const char *dev_name, void *data, struct vfsmount *mnt) | > +{ | > return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt); | > } | > | > +#endif | > + | > static void devpts_kill_sb(struct super_block *sb) | > { | > struct pts_fs_info *fsi = DEVPTS_SB(sb); | > @@ -431,8 +588,9 @@ void devpts_pty_kill(struct tty_struct *tty) | > if (dentry && !IS_ERR(dentry)) { | > inode->i_nlink--; | > d_delete(dentry); | > - dput(dentry); | > + dput(dentry); // d_lookup in devpts_pty_new | > } | > + dput(dentry); // d_find_alias above | > | > mutex_unlock(&root->d_inode->i_mutex); | > } | > -- | > 1.5.2.5 _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers