On Tue, 26 May 2020 11:33:09 -0400 Johannes Weiner wrote: > On Wed, May 20, 2020 at 05:24:11PM -0700, Jakub Kicinski wrote: > > Add a memory.swap.high knob, which can be used to protect the system > > from SWAP exhaustion. The mechanism used for penalizing is similar > > to memory.high penalty (sleep on return to user space), but with > > a less steep slope. > > The last part is no longer true after incorporating Michal's feedback. > > > + /* > > + * Make the swap curve more gradual, swap can be considered "cheaper", > > + * and is allocated in larger chunks. We want the delays to be gradual. > > + */ > > This comment is also out-of-date, as the same curve is being applied. Indeed :S > > + penalty_jiffies += calculate_high_delay(memcg, nr_pages, > > + swap_find_max_overage(memcg)); > > + > > /* > > * Clamp the max delay per usermode return so as to still keep the > > * application moving forwards and also permit diagnostics, albeit > > @@ -2585,12 +2608,25 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > * reclaim, the cost of mismatch is negligible. > > */ > > do { > > - if (page_counter_is_above_high(&memcg->memory)) { > > - /* Don't bother a random interrupted task */ > > - if (in_interrupt()) { > > + bool mem_high, swap_high; > > + > > + mem_high = page_counter_is_above_high(&memcg->memory); > > + swap_high = page_counter_is_above_high(&memcg->swap); > > Please open-code these checks instead - we don't really do getters and > predicates for these, and only have the setters because they are more > complicated operations. I added this helper because the calculation doesn't fit into 80 chars. In particular reclaim_high will need a temporary variable or IMHO questionable line split. static void reclaim_high(struct mem_cgroup *memcg, unsigned int nr_pages, gfp_t gfp_mask) { do { if (!page_counter_is_above_high(&memcg->memory)) continue; memcg_memory_event(memcg, MEMCG_HIGH); try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); } What's your preference? Mine is a helper, but I'm probably not sensitive enough to the ontology here :) > > + if (mem_high || swap_high) { > > + /* Use one counter for number of pages allocated > > + * under pressure to save struct task space and > > + * avoid two separate hierarchy walks. > > + /* > > current->memcg_nr_pages_over_high += batch; > > That comment style is leaking out of the networking code ;-) Please > use the customary style in this code base, /*\n *... > > As for one counter instead of two: I'm not sure that question arises > in the reader. There have also been some questions recently what the > counter actually means. How about the following: > > /* > * The allocating tasks in this cgroup will need to do > * reclaim or be throttled to prevent further growth > * of the memory or swap footprints. > * > * Target some best-effort fairness between the tasks, > * and distribute reclaim work and delay penalties > * based on how much each task is actually allocating. > */ sounds good! > Otherwise, the patch looks good to me. Thanks!