On Wed, Aug 31, 2022 at 8:28 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Wed, Aug 31, 2022 at 3:47 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Wed 31-08-22 11:19:48, Mel Gorman wrote: > > > On Wed, Aug 31, 2022 at 04:42:30AM -0400, Kent Overstreet wrote: > > > > On Wed, Aug 31, 2022 at 09:38:27AM +0200, Peter Zijlstra wrote: > > > > > On Tue, Aug 30, 2022 at 02:48:49PM -0700, Suren Baghdasaryan wrote: > > > > > > =========================== > > > > > > Code tagging framework > > > > > > =========================== > > > > > > Code tag is a structure identifying a specific location in the source code > > > > > > which is generated at compile time and can be embedded in an application- > > > > > > specific structure. Several applications of code tagging are included in > > > > > > this RFC, such as memory allocation tracking, dynamic fault injection, > > > > > > latency tracking and improved error code reporting. > > > > > > Basically, it takes the old trick of "define a special elf section for > > > > > > objects of a given type so that we can iterate over them at runtime" and > > > > > > creates a proper library for it. > > > > > > > > > > I might be super dense this morning, but what!? I've skimmed through the > > > > > set and I don't think I get it. > > > > > > > > > > What does this provide that ftrace/kprobes don't already allow? > > > > > > > > You're kidding, right? > > > > > > It's a valid question. From the description, it main addition that would > > > be hard to do with ftrace or probes is catching where an error code is > > > returned. A secondary addition would be catching all historical state and > > > not just state since the tracing started. > > > > > > It's also unclear *who* would enable this. It looks like it would mostly > > > have value during the development stage of an embedded platform to track > > > kernel memory usage on a per-application basis in an environment where it > > > may be difficult to setup tracing and tracking. Would it ever be enabled > > > in production? Would a distribution ever enable this? If it's enabled, any > > > overhead cannot be disabled/enabled at run or boot time so anyone enabling > > > this would carry the cost without never necessarily consuming the data. > > Thank you for the question. > For memory tracking my intent is to have a mechanism that can be enabled in > the field testing (pre-production testing on a large population of > internal users). > The issue that we are often facing is when some memory leaks are happening > in the field but very hard to reproduce locally. We get a bugreport > from the user > which indicates it but often has not enough information to track it. Note that > quite often these leaks/issues happen in the drivers, so even simply finding out > where they came from is a big help. > The way I envision this mechanism to be used is to enable the basic memory > tracking in the field tests and have a user space process collecting > the allocation > statistics periodically (say once an hour). Once it detects some counter growing > infinitely or atypically (the definition of this is left to the user > space) it can enable > context capturing only for that specific location, still keeping the > overhead to the > minimum but getting more information about potential issues. Collected stats and > contexts are then attached to the bugreport and we get more visibility > into the issue > when we receive it. > The goal is to provide a mechanism with low enough overhead that it > can be enabled > all the time during these field tests without affecting the device's > performance profiles. > Tracing is very cheap when it's disabled but having it enabled all the > time would > introduce higher overhead than the counter manipulations. > My apologies, I should have clarified all this in this cover letter > from the beginning. > > As for other applications, maybe I'm not such an advanced user of > tracing but I think only > the latency tracking application might be done with tracing, assuming > we have all the > right tracepoints but I don't see how we would use tracing for fault > injections and > descriptive error codes. Again, I might be mistaken. Sorry about the formatting of my reply. Forgot to reconfigure the editor on the new machine. > > Thanks, > Suren. > > > > > > > It might be an ease-of-use thing. Gathering the information from traces > > > is tricky and would need combining multiple different elements and that > > > is development effort but not impossible. > > > > > > Whatever asking for an explanation as to why equivalent functionality > > > cannot not be created from ftrace/kprobe/eBPF/whatever is reasonable. > > > > Fully agreed and this is especially true for a change this size > > 77 files changed, 3406 insertions(+), 703 deletions(-) > > > > -- > > Michal Hocko > > SUSE Labs