On Thu, Jan 06, 2022 at 02:27:48PM -0800, John Hubbard wrote: > On 12/28/21 09:59, Minchan Kim wrote: > > A Contiguous Memory Allocator(CMA) allocation can fail if any page > > within the requested range has an elevated refcount(a pinned page). > > > > Debugging such failures is difficult, because the struct pages only > > show a combined refcount, and do not show the callstacks or > > backtraces of the code that acquired each refcount. So the source > > of the page pins remains a mystery, at the time of CMA failure. > > > > In order to solve this without adding too much overhead, just do > > nothing most of the time, which is pretty low overhead. However, > > once a CMA failure occurs, then mark the page (this requires a > > pointer's worth of space in struct page, but it uses page extensions > > to get that), and start tracing the subsequent put_page() calls. > > As the program finishes up, each page pin will be undone, and > > traced with a backtrace. The programmer reads the trace output and > > sees the list of all page pinning code paths. > > > > This will consume an additional 8 bytes per 4KB page, or an > > additional 0.2% of RAM. In addition to the storage space, it will > > have some performance cost, due to increasing the size of struct > > page so that it is greater than the cacheline size (or multiples > > thereof) of popular (x86, ...) CPUs. > > > > The idea can apply every user of migrate_pages as well as CMA to > > know the reason why the page migration failed. To support it, > > the implementation takes "enum migrate_reason" string as filter > > of the tracepoint(see below). > > > > Hi Minchan, > > If this is ready to propose, then maybe it's time to remove the "RFC" > qualification from the subject line, and re-post for final review. > > And also when you do that, could you please specify which tree or commit > this applies to? I wasn't able to figure that out this time. Sorry for that. It was based on next-20211224. > > > Usage) > > This extensive "usage" section is probably helpful, but the commit > log is certainly not the place for the "how to" documentation. Let's > find an .rst file to stash it in, I think. I wanted to get some review for implementation/interface/usage before respin removing the RFC. Otherwise, the the documentation need to keep update heavily. Based on your comment, I think you are almost agree with as-is. Then, yeah, let me cook up the doc and repost it with removing the RFC tag. Thanks.