On Tue, Oct 22, 2019 at 06:57:45PM +0200, Michal Hocko wrote: > [Cc Mel] > > On Tue 22-10-19 12:21:56, Waiman Long wrote: > > The pagetypeinfo_showfree_print() function prints out the number of > > free blocks for each of the page orders and migrate types. The current > > code just iterates the each of the free lists to get counts. There are > > bug reports about hard lockup panics when reading the /proc/pagetyeinfo > > file just because it look too long to iterate all the free lists within > > a zone while holing the zone lock with irq disabled. > > > > Given the fact that /proc/pagetypeinfo is readable by all, the possiblity > > of crashing a system by the simple act of reading /proc/pagetypeinfo > > by any user is a security problem that needs to be addressed. > > Should we make the file 0400? It is a useful thing when debugging but > not something regular users would really need for life. > I think this would be useful in general. The information is not that useful outside of debugging. Even then it's only useful when trying to get a handle on why a path like compaction is taking too long. > > There is a free_area structure associated with each page order. There > > is also a nr_free count within the free_area for all the different > > migration types combined. Tracking the number of free list entries > > for each migration type will probably add some overhead to the fast > > paths like moving pages from one migration type to another which may > > not be desirable. > > Have you tried to measure that overhead? > I would prefer this option not be taken. It would increase the cost of watermark calculations which is a relatively fast path. > > we can actually skip iterating the list of one of the migration types > > and used nr_free to compute the missing count. Since MIGRATE_MOVABLE > > is usually the largest one on large memory systems, this is the one > > to be skipped. Since the printing order is migration-type => order, we > > will have to store the counts in an internal 2D array before printing > > them out. > > > > Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the > > zone lock for too long blocking out other zone lock waiters from being > > run. This can be problematic for systems with large amount of memory. > > So a check is added to temporarily release the lock and reschedule if > > more than 64k of list entries have been iterated for each order. With > > a MAX_ORDER of 11, the worst case will be iterating about 700k of list > > entries before releasing the lock. > > But you are still iterating through the whole free_list at once so if it > gets really large then this is still possible. I think it would be > preferable to use per migratetype nr_free if it doesn't cause any > regressions. > I think it will. The patch as it is contains the overhead within the reader of the pagetypeinfo proc file which is a non-critical path. The page allocator paths on the other hand is very important. -- Mel Gorman SUSE Labs