On Wed, Oct 12, 2022 at 8:16 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Wed, 12 Oct 2022 18:40:50 -0700 Jakub Kicinski wrote: > > Did the fact that we used to force charge not potentially cause > > reclaim, tho? Letting TCP accept the next packet even if it had > > to drop the current one? > > I pushed this little nugget to one affected machine via KLP: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 03ffbb255e60..c1ca369a1b77 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -7121,6 +7121,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, > return true; > } > > + if (gfp_mask == GFP_NOWAIT) { > + try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages); > + refill_stock(memcg, nr_pages); > + } > return false; > } > AFAICT, if you force charge by passing __GFP_NOFAIL to try_charge(), you should return true to tell the caller that the nr_pages is actually being charged. Although I am not very sure what refill_stock() does. Does that "uncharge" those pages? > The problem normally reproes reliably within 10min -- 30min and counting > and the application-level latency has not spiked.