On Mon, 11 May 2015, Vlastimil Babka wrote: > Since we track THP availability for khugepaged THP collapses, we can use it > also for page fault THP allocations. If khugepaged with its sync compaction > is not able to allocate a hugepage, then it's unlikely that the less involved > attempt on page fault would succeed, and the cost could be higher than THP > benefits. Also clear the THP availability flag if we do attempt and fail to > allocate during page fault, and set the flag if we are freeing a large enough > page from any context. The latter doesn't include merges, as that's a fast > path and unlikely to make much difference. > That depends on how long {scan,alloc}_sleep_millisecs are, so if khugepaged fails to allocate a hugepage on all nodes, it sleeps for alloc_sleep_millisecs (default 60s), and then there's immediate memory freeing, thp page faults don't happen again for 60s. That's scary to me when thp_avail_nodes is clear, a large process terminates, and then immediately starts back up. None of its memory is faulted as thp and depending on how large it is, khugepaged may fail to allocate hugepages when it wakes back up so it never scans (the only reason why thp_avail_nodes was clear before it terminated originally). I'm not sure that approach can work unless the inference of whether a hugepage can be allocated at a given time is a very good indicator of whether a hugepage can be allocated alloc_sleep_millisecs later, and I'm afraid that's not the case. I'm very happy that you're looking at thp fault latency and the role that khugepaged can play in accepting responsibility for defragmentation, though. It's an area that has caused me some trouble lately and I'd like to be able to improve. We see an immediate benefit when experimenting with doing synchronous memory compactions of all memory every 15s. That's done using a cronjob rather than khugepaged, but the idea is the same. What would your thoughts be about doing something radical like - having khugepaged do synchronous memory compaction of all memory at regulary intervals, - track how many pageblocks are free for thp memory to be allocated, - terminate collapsing if free pageblocks are below a threshold, - trigger a khugepaged wakeup at page fault when that number of pageblocks falls below a threshold, - determine the next full sync memory compaction based on how many pageblocks were defragmented on the last wakeup, and - avoid memory compaction for all thp page faults. (I'd ignore what is actually the responsibility of khugepaged and what is done in task work at this time.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>