On Thu, Oct 30, 2014 at 09:35:44AM +0100, Andi Kleen wrote: > We already have too many VM tunables. Better would be to switch > automatically somehow. > > I guess you could use some kind of work stealing scheduler, but these > are fairly complicated. Maybe some simpler heuristics can be found. That would be a better option in general, but (admittedly not having thought about it much), I can't think of a good way to determine when to make that switch. The main problem being that we're not really seeing a negative performance impact from khugepaged, but some undesired behavior, which always exists. Perhaps we could make a decision based on the number of remote allocations made by khugepaged? If we see a lot of allocations to distant nodes, then maybe we tell khugepaged to stop running scans for a particular process/mm and let the job handle things itself, either using the task_work style scan that I've proposed, or just banning khugepaged, period. Again, I don't think this is a very good way to make the decision, but something to think about. > BTW my thinking has been usually to actually use more khugepageds to > scan large address spaces faster. I hadn't thought of it, but I suppose that is an option as well. Unless I've completely missed something in the code, I don't think there's a way to do this now, right? Either way, I suppose it wouldn't be too hard to do, but this still leaves the window wide open for allocations to be made far away from where the process really needs them. Maybe if we had a way to spin up a new khugepaged on the fly, so that users can pin it where they want it, that would work? Just brainstorming here... - Alex -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>