Re: [RFC] Add option to mount only a pids subset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 23, 2017 at 05:06:28PM +0100, Djalal Harouni wrote:
> Hi Alexey,
> 
> On Mon, Mar 20, 2017 at 1:58 PM, Alexey Gladkov
> <gladkov.alexey@xxxxxxxxx> wrote:
> >
> >
> > Al Viro, this patch looks better ?
> >
> > == Overview ==
> >
> > Some of the container virtualization systems are mounted /proc inside
> > the container. This is done in most cases to operate with information
> > about the processes. Knowing that /proc filesystem is not fully
> > virtualized they are mounted on top of dangerous places empty files or
> > directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.).
> >
> > The structure of this filesystem is dynamic and any module can create a
> > new object which will not necessarily be virtualized. There are
> > proprietary modules that aren't in the mainline whose work we can not
> > verify.
> >
> > This opens up a potential threat to the system. The developers of the
> > virtualization system can't predict all dangerous places in /proc by
> > definition.
> >
> > A more effective solution would be to mount into the container only what
> > is necessary and ignore the rest.
> >
> > Right now there is the opportunity to pass in the container any port of
> > the /proc filesystem using mount --bind expect the pids.
> >
> > This patch allows to mount only the part of /proc related to pids without
> > rest objects. Since this is an option for /proc, flags applied to /proc
> > have an effect on this subset of filesystem.
> 
> I just sent a patch that also has to deal with proc hidepid here:
> https://lkml.org/lkml/2017/3/23/505

I completely agree with you that it looks wrong when options of one
mountpoint affect the others mountpoints.

> I'm not sure if that's the right approach, it is still buggy, however
> seems that your patch also stores the mount option inside the
> pid_namespace which may get propagated to all mounts inside same pidns
> ?

I don't store 'pidonly' option in my current patch. I mean in:
https://lkml.org/lkml/2017/3/20/363

I parse options twice at first mount of procfs. It happens before
the mounting /proc in userspace.

I know it's bad, but I couldn't find place to store this option.

> I didn't have enough time but maybe if they are related we can work it
> out together ?

I don't have enough experience in kenrel hacking, but I would happily do
my best :)

I also tring to mention it in every patch, as my changes almost completely
useless without the ability to use the overlayfs.

Now if you remove the restriction:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/inode.c#n497

and mount procfs as lowerdir in overlayfs then you get NULL pointer
dereference at:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c#n891

I got it when I tried to do `ls -la /overlay/proc/self/`.

-- 
Rgrds, legion

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux