Re: [PATCH v2] mm/vmscan: shrink slab in node reclaim

Yafang Shao <laoar.shao@xxxxxxxxx> · Tue, 6 Aug 2019 19:34:40 +0800

On Tue, Aug 6, 2019 at 7:09 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Tue 06-08-19 18:59:52, Yafang Shao wrote:
> > On Tue, Aug 6, 2019 at 6:28 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > > On Tue 06-08-19 17:54:02, Yafang Shao wrote:
> > > > On Tue, Aug 6, 2019 at 5:50 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Aug 06, 2019 at 11:25:31AM +0200, Michal Hocko wrote:
> > > > > > On Tue 06-08-19 17:15:05, Yafang Shao wrote:
> > > > > > > On Tue, Aug 6, 2019 at 5:05 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > > > > [...]
> > > > > > > > > As you said, the direct reclaim path set it to 1, but the
> > > > > > > > > __node_reclaim() forgot to process may_shrink_slab.
> > > > > > > >
> > > > > > > > OK, I am blind obviously. Sorry about that. Anyway, why cannot we simply
> > > > > > > > get back to the original behavior by setting may_shrink_slab in that
> > > > > > > > path as well?
> > > > > > >
> > > > > > > You mean do it as the commit 0ff38490c836 did  before ?
> > > > > > > I haven't check in which commit the shrink_slab() is removed from
> > > > > >
> > > > > > What I've had in mind was essentially this:
> > > > > >
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 7889f583ced9..8011288a80e2 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -4088,6 +4093,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> > > > > >               .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP),
> > > > > >               .may_swap = 1,
> > > > > >               .reclaim_idx = gfp_zone(gfp_mask),
> > > > > > +             .may_shrinkslab = 1;
> > > > > >       };
> > > > > >
> > > > > >       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > > > > >
> > > > > > shrink_node path already does shrink slab when the flag allows that. In
> > > > > > other words get us back to before 1c30844d2dfe because that has clearly
> > > > > > changed the long term node reclaim behavior just recently.
> > > > >
> > > > > I'd be fine with this change. It was not intentional to significantly
> > > > > change the behaviour of node reclaim in that patch.
> > > > >
> > > >
> > > > But if we do it like this, there will be bug in the knob vm.min_slab_ratio.
> > > > Right ?
> > >
> > > Yes, and the answer for that is a question why do we even care? Which
> > > real life workload does suffer from the of min_slab_ratio misbehavior.
> > > Also it is much more preferred to fix an obvious bug/omission which
> > > lack of may_shrinkslab in node reclaim seem to be than a larger rewrite
> > > with a harder to see changes.
> > >
> >
> > Fixing the bug in min_slab_ratio doesn't  require much change, as it
> > just introduce a new bit in scan_control which doesn't require more
> > space
> > and a if-branch in shrink_node() which doesn't take much cpu cycles
> > neither, and it will not take much maintaince neither as no_pagecache
> > is 0 by default and then we don't need to worry about what if we
> > forget it.
>
> You are still missing my point, I am afraid. I am not saying your change
> is wrong or complex. I am saying that there is an established behavior
> (even when wrong) that node-reclaim dependent loads might depend on.
> Your testing doesn't really suggest you have done much testing beyond
> the targeted one which is quite artificial to say the least.
>
> Maybe there are workloads which do depend on proper min_slab_ratio
> behavior but it would be much more preferable to hear from them rather
> than change the behavior based on the code inspection and a
> microbenchmark.
>
> Is my thinking more clear now?
>

Thanks for your clarification.
I get your point now.

Thanks
Yafang