On Wed, Jul 17, 2024 at 2:26 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Tue 16-07-24 18:06:17, David Finkel wrote: > > On Tue, Jul 16, 2024 at 4:00 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > > > > > Hello, > > > > > > On Tue, Jul 16, 2024 at 08:00:52PM +0200, Michal Hocko wrote: > > > ... > > > > > If we want to allow peak measurement of time periods, I wonder whether we > > > > > could do something similar to pressure triggers - ie. let users register > > > > > watchers so that each user can define their own watch periods. This is more > > > > > involved but more useful and less error-inducing than adding reset to a > > > > > single counter. > > > > > > > > I would rather not get back to that unless we have many more users that > > > > really need that. Absolute value of the memory consumption is a long > > > > living concept that doesn't make much sense most of the time. People > > > > just tend to still use it because it is much simpler to compare two different > > > > values rather than something as dynamic as PSI similar metrics. > > > > > > The initial justification for adding memory.peak was that it's mostly to > > > monitor short lived cgroups. Adding reset would make it used more widely, > > > which isn't necessarily a bad thing and people most likely will find ways to > > > use it creatively. I'm mostly worried that that's going to create a mess > > > down the road. Yeah, so, it's not widely useful now but adding reset makes > > > it more useful and in a way which can potentially paint us into a corner. > > > > This is a pretty low-overhead feature as-is, but we can reduce it further by > > changing it so instead of resetting the watermark on any non-empty string, > > we reset only on one particular string. > > > > I'm thinking of something like "global_reset\n", so if we do something like the > > PSI interface later, users can write "fd_local_reset\n", and get that > > nicer behavior. > > > > This also has the benefit of allowing "echo global_reset > > > /sys/fs/cgroup/.../memory.peak" to do the right thing. > > (better names welcome) > > This would be a different behavior than in v1 and therefore confusing > for those who rely on this in v1 already. So I wouldn't overengineer it > and keep the semantic as simple as possible. If we decide to add PSI > triggers they are completely independent on peak value because that is > reclaim based interface which by definition makes peak value very > dubious. That's fair. My only thought is that "write any non-empty string", is a very wide interface to support, and limits other possible behaviors later. FWIW, I have patches with and without this narrowed interface ready to re-mail pending the outcome of this discussion. (both include additional use-case info in the changelog) Keeping the "all non-empty writes" behavior: https://github.com/dfinkel/linux/commit/5edb21882a88693024a95bbd76b4a8d4561348da With the narrowed interface: https://github.com/dfinkel/linux/commit/f341c8c0cf5fbdcf7af30e20b334e532df74906d > -- > Michal Hocko > SUSE Labs Thanks, -- David Finkel Senior Principal Software Engineer, Core Services