On 14.01.22 17:39, Minchan Kim wrote: > On Fri, Jan 14, 2022 at 02:31:43PM +0100, David Hildenbrand wrote: >> On 12.01.22 21:41, Minchan Kim wrote: >>> On Wed, Jan 12, 2022 at 06:42:21PM +0100, David Hildenbrand wrote: >>>>>> >>>>>> What about something like: >>>>>> >>>>>> "mm: selective tracing of page reference holders on unref" >>>>>> >>>>>> PAGE_EXT_PIN_OWNER -> PAGE_EXT_TRACE_UNREF >>>>>> >>>>>> $whatever feature/user can then set the bit, for example, when migration >>>>>> fails. >>>>> >>>>> I couldn't imagine put_page tracking is generally useful except >>>>> migration failure. Do you have reasonable usecase in your mind >>>>> to make the feature general to be used? >>>> >>>> HWpoison etc. purposes maybe? Trace who still held a reference a page >>>> that was poisoned and couldn't be removed? Or in general, tracking >>> >>> I am not familiar with hwpoison so here dumb question goes: >>> Is that different one with __soft_offline_page? >>> It uses migrate_pages so current interface supports it with filter. >> >> __soft_offline_page() won't kill the target and try to migrate because >> the pages are about to be damaged and we can still access them. >> >> ordinary memory errors mean we kill the target because we cannot access >> the page anymore without triggering MCEs (or worse IIUC) again. >> >> So in my thinking, after memory_failure(), it could eventually be >> helpful to figure out who still has a (temporary) reference to such a >> broken page, even after killing the process. But that's just one idea I >> quickly came up with. > > > Thanks for the clarification. Is the trace best fit in the case? > Presumably you know the broken page, can't you find who owns the page > using /proc/pid/pagemap? > Furthermore, page_get/put operations commonly could happen in > different contexts regardless of page's owner so the trace from > different context is still helpful? > > If it's helpful, could you tell what you want to make the interface to > set the bit of broken page? For example, as-is case for page migration, > report_page_pinners is called to mark failed page at unmap_and_move. > Do you want to add something similar(maybe, set_page_ref_track under > rename) in memory-failure.c? > > It would be very helpful to design the feature's interface(surely, > naming as well) and write description to convince others "yeah, > sounds like so useful for the case and that's best fit than other way". I currently don't have time to explore this further, it was just a random thought how else this might be useful. >> >>> >>> echo "memory_failure" > $trace_dir/events/page_pin_owner/report_page_pinners/filter >>> >>>> references to something that should have a refcount of 0 because it >>>> should have been freed, but for some reason there are still references >>>> around? >>> >>> Sounds like you are talking about memory leak? What's the purpose >>> with trace, not using other existing tool to find memory leak? >>> >> >> IIRC, kmemleak can find objects that are no longer referenced, and we >> cannot track buddy allocations, but only kmalloc and friends. > > PageOwner is your good buddy. Page owner tracks who owns a page but not who else holds a reference (some buffer, gup, whatsoever), right? > >> >>>> >>>>> Otherwise, I'd like to have feature naming more higher level >>>>> to represent page migration failure and then tracking unref of >>>>> the page. In the sense, PagePinOwner John suggested was good >>>>> candidate(Even, my original naming PagePinner was worse) since >>>> >>>> Personally, I dislike both variants. >>>> >>>>> I was trouble to abstract the feature with short word. >>>>> If we approach "what feature is doing" rather than "what's >>>>> the feature's goal"(I feel the your suggestion would be close >>>>> to what feature is doing), I'd like to express "unreference on >>>>> migraiton failed page" so PAGE_EXT_UNMIGRATED_UNREF >>>>> (However, I prefer the feature naming more "what we want to achieve") >>>>> >>>> E.g., PAGE_EXT_TRACE_UNREF will trace unref to the page once the bit is >>>> set. The functionality itself is completely independent of migration >>>> failures. That's just the code that sets it to enable the underlying >>>> tracing for that specific page. >>> >>> I agree that make something general is great but I also want to avoid >>> create something too big from the beginning with just imagination. >>> So, I'd like to hear more concrete and appealing usecases and then >>> we could think over this trace approach is really the best one to >>> achieve the goal. Once it's agreed, the naming you suggested would >>> make sense. >> >> At least for me it's a lot cleaner if a feature clearly expresses what >> it actually does. Staring at PAGE_EXT_PIN_OWNER I initially had no clue. >> I was assuming we would actually track (not trace!) all active FOLL_PIN >> (not unref callers!). Maybe that makes it clearer why I'd prefer a >> clearer name. > > I totally agree PagePinOwner is not 100% straightforward. I'm open for > other better name. Currently we are discussing how we could generalize > and whether it's useful or not. Depending on the discussion, the design/ > interface as well as naming could be changed. No problem. PagePinOwner is just highly misleading. Because that's not what the feature does. Having that said, i hope we'll get other opinions as well. >> >>>> >>>> Makes sense, I was expecting the output to be large, but possible it's >>>> going to be way too large. >>>> >>>> Would it also make sense to track for a flagged page new taken >>>> references, such that you can differentiate between new (e.g., >>>> temporary) ones and previous ones? Feels like a reasonable addition. >>> >>> I actually tried it and it showed 2x times bigger output. >> >> Is 2x that bad? Or would it be worth making it configurable? > > For my goal, 2x was bad because I need to minimize the trace buffer. > Furthermore, the new get operation was not helpful but just noisy. > If some usecase is not enough with only put callsite, we add get > easily. Implementation is not hard. The matter is how it's useful > in real practice since we would expose the interface to the user, > I guess. Right. but it's a debugging feature after all, so I asume we care little about KABI. We can always add it on top, I agree. > >> >>> For me to debug CMA alloation failure, the new get_page callstack >>> after migration failure were waste since they repeated from lru >>> adding, isolate from the LRU something. Rather than get callsite, >>> I needed only put call sites since I could deduce where the pair-get >>> came from. >> >> Could maybe some filters help that exclude such LRU activity? > > For my case, the filter wouldn't be helpful because I shouldn't > miss the LRU activity since sometime they makes the allocation > failure. Thus, I could deduce the lru's get site from put callback. > get callsite is not helpful for my case but just wasted memory > and perf. > Makes sense. -- Thanks, David / dhildenb