On Wed, Aug 05, 2020 at 08:01:33PM -0700, Hugh Dickins wrote: > On Mon, 3 Aug 2020, Roman Gushchin wrote: > > On Fri, Jul 31, 2020 at 07:17:05PM -0700, Hugh Dickins wrote: > > > On Fri, 31 Jul 2020, Roman Gushchin wrote: > > > > On Thu, Jul 30, 2020 at 09:06:55PM -0700, Hugh Dickins wrote: > > > > > > > > > > Though another alternative did occur to me overnight: we could > > > > > scrap the logged warning, and show "nr_whatever -53" as output > > > > > from /proc/sys/vm/stat_refresh: that too would be acceptable > > > > > to me, and you redirect to /dev/null. > > > > > > > > It sounds like a good idea to me. Do you want me to prepare a patch? > > > > > > Yes, if you like that one best, please do prepare a patch - thanks! > > > > Hi Hugh, > > > > I mastered a patch (attached below), but honestly I can't say I like it. > > The resulting interface is confusing: we don't generally use sysctls to > > print debug data and/or warnings. > > Since you confessed to not liking it yourself, I paid it very little > attention. Yes, when I made that suggestion, I wasn't really thinking > of how stat_refresh is a /proc/sys/vm sysctl thing; and I'm not at all > sure how issuing output from a /proc file intended for input works out > (perhaps there are plenty of good examples, and you followed one, but > it smells fishy to me now). > > > > > I thought about treating a write to this sysctls as setting the threshold, > > so that "echo 0 > /proc/sys/vm/stat_refresh" would warn on all negative > > entries, and "cat /proc/sys/vm/stat_refresh" would use the default threshold > > as in my patch. But this breaks to some extent the current ABI, as passing > > an incorrect value will result in -EINVAL instead of passing (as now). > > I expect we could handle that well enough, by more lenient validation > of the input; though my comment above on output versus input sheds doubt. > > > > > Overall I still think we shouldn't warn on any values inside the possible > > range, as it's not an indication of any kind of error. The only reason > > why we see some values going negative and some not, is that some of them > > are updated more frequently than others, and some are bouncing around > > zero, while other can't reach zero too easily (like the number of free pages). > > We continue to disagree on that (and it amuses me that you who are so > sure they can be ignored, cannot ignore them; whereas I who am so curious > to investigate them, have not actually found the time to do so in years). > It was looking as if nothing could satisfy us both, but... > > > > > Actually, if someone wants to ensure that numbers are accurate, > > we have to temporarily set the threshold to 0, then flush the percpu data > > and only then check atomics. In the current design flushing percpu data > > matters for only slowly updated counters, as all others will run away while > > we're waiting for the flush. So if we're targeting some slowly updating > > counters, maybe we should warn only on them being negative, Idk. > > I was going to look into that angle, though it would probably add a little > unjustifiable overhead to fast paths, and be rejected on that basis. > > But in going to do so, came up against an earlier comment of yours, of > which I had misunderstood the significance. I had said and you replied: > > > > nr_zone_write_pending: yes, I've looked at our machines, and see that > > > showing up for us too (-49 was the worst I saw). Not at all common, > > > but seen. And not followed by increasingly worse numbers, so a state > > > that corrects itself. nr_dirty too (fewer instances, bigger numbers); > > > but never nr_writeback, which you'd expect to go along with those. > > > > NR_DIRTY and NR_WRITEBACK are node counters, so we don't check them? > > Wow. Now I see what you were pointing out: when v4.8's 75ef71840539 > ("mm, vmstat: add infrastructure for per-node vmstats") went in, it > missed updating vmstat_refresh() to check all the NR_VM_NODE_STAT items. > > And I've never noticed, and have interpreted its silence on those items > as meaning they're all good (and the nr_dirty ones I mentioned above, > must have been from residual old kernels, hence the fewer instances). > I see the particularly tricky NR_ISOLATED ones are in that category. > Maybe they are all good, but I have been mistaken. > > I shall certainly want to reintroduce those stats to checking for > negatives, even if it's in a patch that never earns your approval, > and just ends up kept internal for debugging. But equally certainly, > I must not suddenly reintroduce that checking without gaining some > experience of it (and perhaps getting as irritated as you by more > transient negatives). > > I said earlier that I'd prefer you to rip out all that checking for > negatives, rather than retaining it with the uselessly over-generous > 125 * nr_cpus leeway. Please, Roman, would you send Andrew a patch > doing that, to replace the patch in this thread? Or if you prefer, > I can do so. Hi Andrew, it seems that Hugh and me haven't reached a consensus here. Can, you, please, not merge this patch into 5.9, so we would have more time to find a solution, acceptable for all? Thank you! Roman