On 01/13/2017 08:14 AM, js1304@xxxxxxxxx wrote: > From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > > We have a statistic about memory fragmentation but it would be fluctuated > a lot within very short term so it's hard to accurately measure > system's fragmentation state while workload is actively running. Without > stable statistic, it's not possible to determine if the system is > fragmented or not. > > Meanwhile, recently, there were a lot of reports about fragmentation > problem and we tried some changes. However, since there is no way > to measure fragmentation ratio stably, we cannot make sure how these > changes help the fragmentation. > > There are some methods to measure fragmentation but I think that they > have some problems. > > 1. buddyinfo: it fluctuated a lot within very short term > 2. tracepoint: it shows how steal happens between buddylists of different > migratetype. It means fragmentation indirectly but would not be accurate. > 3. pageowner: it shows the number of mixed pageblocks but it is not > suitable for production system since it requires some additional memory. > > Therefore, this patch try to calculate exponential moving average to > unusable free index. Since it is a moving average, it is quite stable > even if fragmentation state of memory fluctuate a lot. I suspect that the fluctuation of the underlying unusable free index isn't so much because the number of high-order free blocks would fluctuate, but because of allocation vs reclaim changing the total number of free blocks, which is used in the equation. Reclaim uses LRU which I expect to have low correlation with pfn, so the freed pages tend towards order-0. And the allocation side tries not to split large pages so it also consumes mostly order-0. So I would expect just plain free_blocks_order from contig_page_info to be a good metric without need for averaging, at least for costly orders and when we have enough free memory - if we are below e.g. the high (order-0) watermark, then we should let kswapd do its job first anyway before considering proactive compaction. > I made this patch 3 month ago and implementation detail looks not > good to me now. Maybe, it's better to rule out update code in allocation > path and make it timer based. Anyway, this patch is just for RFC. Yeah, any hooks in allocation/free hotpaths are going to meet strong resistance :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>