On Fri, 6 Dec 2024, Chen Ridong wrote: > On 2024/12/6 13:33, Yu Zhao wrote: > > On Thu, Dec 5, 2024 at 6:45 PM Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote: > >> From: Chen Ridong <chenridong@xxxxxxxxxx> > >> > >> In the `mem_cgroup_update_lru_size` function, the `lru_size` should be > >> updated by adding `nr_pages` regardless of whether `nr_pages` is greater > >> than 0 or less than 0. To simplify this function, add a check for > >> `nr_pages` == 0. When `nr_pages` is not equal to 0, perform the same > >> actions. > >> > >> Signed-off-by: Chen Ridong <chenridong@xxxxxxxxxx> > > > > NAK. > > > > The commit that added that clearly explains why it was done that way. Thanks Yu: I did spot this going by, but was indeed hoping that someone else would NAK it, with more politeness than I could muster at the time! > > Thank you for your reply. > > I have read the commit message for ca707239e8a7 ("mm: update_lru_size > warn and reset bad lru_size") before sending my patch. However, I did > not quite understand why we need to deal with the difference between > 'nr_pages > 0' and 'nr_pages < 0'. > > > The 'lru_zone_size' can only be modified in the > 'mem_cgroup_update_lru_size' function. Only subtracting 'nr_pages' or > adding 'nr_pages' in a way that causes an overflow can make the size < 0. > > For case 1, subtracting 'nr_pages', we should issue a warning if the > size goes below 0. For case 2, when adding 'nr_pages' results in an > overflow, there will be no warning and the size won't be reset to 0 the > first time it occurs . It might be that a warning will be issued the > next time 'mem_cgroup_update_lru_size' is called to modify the > 'lru_zone_size'. However, as the commit message said, "the first > occurrence is the most informative," and it seems we have missed that > first occurrence. > > As the commit message said: "and then the vast unsigned long size draws > page reclaim into a loop of repeatedly", I think that a warning should > be issued and 'lru_zone_size' should be reset whenever 'size < 0' occurs > for the first time, whether from subtracting or adding 'nr_pages'. One thing, not obvious, but important to understand, is that WARN_ONCE() only issues a warning the first time its condition is found true, but it returns the true or false of the condition every time. So lru_size is forced to 0 each time an inconsistency is detected. (But IIRC VM_WARN_ON_ONCE() does not behave in that useful way; or maybe I've got that macro name wrong - we have so many slightly differing.) Perhaps understanding that will help you to make better sense of the order of events in this function. Another thing to understand: it's called before adding folio to list, but after removing folio from list: when it can usefully compare whether the emptiness of the list correctly matches lru_size 0. It cannot do so when adding if you "simplify" it in the way that you did. You might be focusing too much on the "size < 0" aspect of it, or you might be worrying more than I did about size growing larger and larger until it wraps to negative (not likely on 64-bit, unless corrupted). I hope these remarks help, but you need to think through it again for yourself. Hugh > > I am be grateful if you can explain more details, it has confused me for > a while. Thank you very much. > > Best regards, > Ridong