Fw: User-controllable memcg-unaccounted objects of time namespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



CC reply. Sorry for occupying your time. I was unaware that I had to
use plain text e-mail to make public mailing list accept it, and I
omitted these addresses to avoid rejecting notification.

Yutian Yang,
Zhejiang University


> -----Original Messages-----
> From: "Michal Hocko" <mhocko@xxxxxxxx>
> Sent Time: 2021-05-19 17:18:45 (Wednesday)
> To: "Yutian Yang" <ytyang@xxxxxxxxxx>
> Cc:
> Subject: Re: Fw: Re: Re: User-controllable memcg-unaccounted objects of time namespace
>
> Did you plan to post this reply to the mailing list with the whole
> original CC list?
>
> On Wed 19-05-21 16:56:27, Yutian Yang wrote:
> >
> >
> >
> > -----Original Messages-----
> > From: "Yutian Yang" <ytyang@xxxxxxxxxx>
> > Sent Time: 2021-05-18 19:29:40 (Tuesday)
> > To: "Michal Hocko" <mhocko@xxxxxxxx>
> > Cc: tglx@xxxxxxxxxxxxx, "vdavydov.dev@xxxxxxxxx" <vdavydov.dev@xxxxxxxxx>, "shenwenbosmile@xxxxxxxxx" <shenwenbosmile@xxxxxxxxx>
> > Subject: Re: Re: User-controllable memcg-unaccounted objects of time namespace
> >
> > Sorry for the delayed response. I believe our patches are necessary and will answer your questions piece by piece.
> >
> > For the practicality of our concerns, we have confirmed that repeatedly creating new namespaces could lead to breaking memcg limit. Although the number of namespaces could be limited by per-user quota (e.g., max_time_namespaces), depending on per-user quota to limit memory usage is unsafe and impractical as users may have their own considerations when setting these limits. In fact, limitation on memory usage is more foundamental than limitation on various kernel objects. I believe this is also the reason why the fd tables and pipe buffers have been accounted by memcg even if they are also under per-user quota's limitation. The same reason applies to limitation of pid cgroups. Moreover, both net and uts namespaces are properly accounted while the others are not, which shows inconsistencies.
> >
> > For other unaccounted allocations (proc_alloc_inum, vvar_page and likely others), we have not reached them yet as our detecting tool reported many results which require much manual effort to go through. To me, it seems that these allocations also need patches.
> >
> > Lastly, our work is based on a detecting tool and we only report missing-charging sites that are manually confirmed to be triggerable from syscalls. The results that are obviously unexploitable like uncharged ldt_struct, which is allocated per process, are also filtered out. We would like to continuously contribute to memcg and we are planning to submit more patches in the future.
> >
> > Thanks!
> >
> > Yutian Yang,
> > Zhejiang University
> >
> >
> > > -----Original Messages-----
> > > From: "Michal Hocko" <mhocko@xxxxxxxx>
> > > Sent Time: 2021-04-16 14:29:52 (Friday)
> > > To: "Yutian Yang" <ytyang@xxxxxxxxxx>
> > > Cc: tglx@xxxxxxxxxxxxx, "shenwenbo@xxxxxxxxxx" <shenwenbo@xxxxxxxxxx>, "vdavydov.dev@xxxxxxxxx" <vdavydov.dev@xxxxxxxxx>
> > > Subject: Re: User-controllable memcg-unaccounted objects of time namespace
> > >
> > > Thank you for this and other reports which are trying to track memcg
> > > unaccounted objects. I have few remarks/questions.
> > >
> > >
> > > On Thu 15-04-21 21:29:57, Yutian Yang wrote:
> > > > Hi, our team has found bugs in time namespace module on Linux kernel v5.10.19, which leads to user-controllable memcg-unaccounted objects.
> > > > They are caused by the code snippets listed below:
> > > >
> > > > /*--------------- kernel/time/namespace.c --------------------*/
> > > > ......
> > > > 91ns = kmalloc(sizeof(*ns), GFP_KERNEL);
> > > > 92if (!ns)
> > > > 93goto fail_dec;
> > > > ......
> > > > /*----------------------------- end -------------------------------*/
> > > >
> > > >
> > > > The code at line 91 could be triggered by syscall clone if
> > > > CLONE_NEWTIME flag is set in the parameter. A user could repeatedly
> > > > make the clone syscall and trigger the bugs to occupy more and
> > > > more unaccounted memory. In fact, time namespaces objects could be
> > > > allocated by users and are also controllable by users. As a result,
> > > > they need to be accounted and we suggest the following patch:
> > >
> > > Is this a practical concern? I am not really deeply familiar with
> > > namespaces but isn't there any cap on how many of them can be created by
> > > user? If not, isn't that contained by the pid cgroup controller? If even
> > > that is not the case, care to explain why?
> > >
> > > You are referring to struct time_namespace above (that is 88B) but I can
> > > see there are other unaccounted allocations (proc_alloc_inum, vvar_page
> > > and likely others) so why the above is more important than those?
> > >
> > > Btw. a similar feedback applies to other reports similar to this one. I
> > > assume you have some sort of tool to explore those potential run aways
> > > and that is really great but it would be really helpful and highly
> > > appreciated to analyze those reports and try to provide some sort of
> > > risk assessment.
> > >
> > > Thanks!
> > > --
> > > Michal Hocko
> > > SUSE Labs
>
> --
> Michal Hocko
> SUSE Labs





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux