2014-08-07 22:04 GMT+09:00 Vlastimil Babka <vbabka@xxxxxxx>: > On 08/07/2014 02:26 PM, Joonsoo Kim wrote: >> >> 2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@xxxxxxx>: >>> >>> Ah, right. I thought that everything going to pcp lists would be through >>> >>> freeing which would already observe the isolate migratetype and skip >>> pcplist. I forgot about the direct filling of pcplists from buddy list. >>> You're right that we don't want extra hooks there. >>> >>> Still, couldn't this be solved in a simpler way via another pcplist drain >>> after the pages are moved from normal to isolate buddy list? Should be >>> even >>> faster because instead of disable - drain - enable (5 all-cpu kicks, >>> since >>> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While >>> it's true that pageset_update is single-zone operation, I guess we would >>> easily benefit from having a single-zone drain operation as well. >> >> >> I hope so, but, it's not possible. Consider following situation. >> >> Page A: on pcplist of CPU2 and it is on isolate pageblock. >> >> CPU 1 CPU 2 >> drain pcplist >> wait IPI finished move A to normal buddy list >> finish IPI >> A is moved to pcplist by allocation request >> >> move doesn't catch A, >> because it is on pcplist. >> >> drain pcplist >> wait IPI finished move A to normal buddy list >> finish IPI >> A is moved to pcplist by allocation request >> >> repeat!! >> >> It could happen infinitely, though, low possibility. > > > Hm I see. Not a correctness issue, but still a failure to isolate. Probably > not impossible with enough CPU's and considering the fact that after > pcplists are drained, the next allocation request will try to refill them. > And during the drain, the pages are added to the beginning of the free_list > AFAICS, so they will be in the first refill batch. I think that it is correctness issue. When page A is moved to normal buddy list, merge could happen and freepage counting would be incorrect. > OK, another attempt for alternative solution proposal :) It's not that I > would think disabling pcp would be so bad, just want to be sure there is no > better alternative. Yeah, welcome any comment. :) > What if the drain operation had a flag telling it to recheck pageblock > migratetype and don't assume it's on the correct pcplist. Then the problem > would go away I think? Would it be possible to do without affecting the > normal drain-pcplist-when-full path? So that the cost is only applied to > isolation, but lower cost than pcplist disabling. > > Actually I look that free_pcppages_bulk() doesn't consider migratetype of > the pcplist, but uses get_freepage_migratetype(page). So the pcplist drain > could first scan the pcplists and rewrite the freepage_migratetype according > to pageblock_migratetype. Then the free_pcppages_bulk() operation would be > unchanged for normal operation. > > Or is this too clumsy? We could be also smart and have an alternative to > free_pcppages_bulk() which would omit the round-robin stuff (not needed for > this kind of drain), and have a pfn range to limit its operation to pages > that we are isolating. > Hm I guess with this approach some pages might still escape us if they were > moving between normal buddy list and pcplist through rmqueue_bulk() and > free_pcppages_bulk() (and not through our drain) at the wrong moments, but I > guess that would require a really specific workload (alternating between > burst of allocations and deallocations) and consistently unlucky timing. > Yes, it has similar problem as I mentioned above. Page A: on pcplist of CPU2 and it is on isolate pageblock. CPU 1 CPU 2 A is on normal buddy list drain pcplist wait IPI finished finish IPI A is moved to pcplist by allocation request move doesn't catch A, because it is on pcplist. move A to normal buddy list by free request drain pcplist wait IPI finished finish IPI A is moved to pcplist by allocation request move doesn't catch A, because it is on pcplist. move A to normal buddy list by free request repeat!! Although it is really corner case, I would like to choose error-free approach something like pcplist disable. :) Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>