Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Shakeel, thank you for reviewing my patch!

On Fri, Nov 8, 2024 at 5:43 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> On Fri, Nov 08, 2024 at 01:29:45PM -0800, Joshua Hahn wrote:
> > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> > memcg in hugetlb code, and also consolidates the error path that memcg
> > can take into just one point.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@xxxxxxxxx>
> > -     if (!memcg_charge_ret)
> > -             mem_cgroup_commit_charge(folio, memcg);
> > -     lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
> > -     mem_cgroup_put(memcg);
> > +     ret = mem_cgroup_charge_hugetlb(folio, gfp);
> > +     if (ret == -ENOMEM) {
> > +             spin_unlock_irq(&hugetlb_lock);
>
> spin_unlock_irq??

Thank you for the catch. I completely missed this after I
swapped the position of mem_cgroup_charge_hugetlb
to be called without the lock. I will fix this.

> > +             free_huge_folio(folio);
>
> free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
> you are only calling it on success. This may underflow the metric.

I was actually thinking about this too. I was wondering what would
make sense -- in the original draft of this patch, I had the charge
increment be called unconditionally as well. The idea was that
even though it would not make sense to have the stat incremented
when there is an error, it would eventually be corrected by
free_huge_folio's decrement. However, because there is nothing
stopping the user from checking the stat in this period, they may
temporarily see that the value is incremented in memory.stat,
even though they were not able to obtain this page.

With that said, maybe it makes sense to increment unconditionally,
if free also decrements unconditionally. This race condition is
not something that will cause a huge problem for the user,
although users relying on userspace monitors for memory.stat
to handle memory management may see some problems.

Maybe what would make the most sense is to do both
incrementing & decrementing conditionally as well.
Thank you for your feedback, I will iterate on this for the next version!

> > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> > +{
> > +     struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> > +     int ret = 0;
> > +
> > +     if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> > +             !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +             ret = -EOPNOTSUPP;
>
> why EOPNOTSUPP? You need to return 0 here. We do want
> lruvec_stat_mod_folio() to be called.

In this case, I was just preserving the original code's return statements.
That is, in mem_cgroup_hugetlb_try_charge, the intended behavior
was to return -EOPNOTSUPP if any of these conditions were met.
If I understand the code correctly, calling lruvec_stat_mod_folio()
on this failure will be a noop, since either memcg doesn't account
hugetlb folios / there is no memcg / memcg is disabled.

With all of this said, I think your feedback makes the most sense
here, given the new semantics of the function: if there is no
memcg or memcg doesn't account hugetlb, then there is no
way that the limit can be reached! I will go forward with returning 0,
and calling lruvec_stat_mod_folio (which will be a noop).

Thank you for your detailed feedback. I wish I had caught these
errors myself, thank you for your time in reviewing my patch.

I hope you have a great rest of your weekend!
Joshua





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux