On Mon 17-12-18 15:08:19, Wei Yang wrote: > On Mon, Dec 17, 2018 at 01:25:23PM +0100, Michal Hocko wrote: > >On Fri 14-12-18 10:39:12, Wei Yang wrote: > >> Below is a brief call flow for __offline_pages() and > >> alloc_contig_range(): > >> > >> __offline_pages()/alloc_contig_range() > >> start_isolate_page_range() > >> set_migratetype_isolate() > >> drain_all_pages() > >> drain_all_pages() > >> > >> Since set_migratetype_isolate() is only used in > >> start_isolate_page_range(), which is just used in __offline_pages() and > >> alloc_contig_range(). And both of them call drain_all_pages() if every > >> check looks good. This means it is not necessary call drain_all_pages() > >> in each iteration of set_migratetype_isolate(). > >> > >> By doing so, the logic seems a little bit clearer. > >> set_migratetype_isolate() handles pages in Buddy, while > >> drain_all_pages() takes care of pages in pcp. > > > >I have to confess I am not sure about the purpose of the draining here. > >I suspect it is to make sure that pages in the pcp lists really get > >isolated and if that is the case then it makes sense. > > > >In any case I strongly suggest not touching this code without a very > >good explanation on why this is not needed. Callers do XYZ is not a > >proper explanation because assumes that all callers will know that this > >has to be done. So either we really need to drain and then it is better > >to make it here or we don't but that requires some explanation. > > > > Yep, let me try to explain what is trying to do. > > Based on my understanding, online_pages do two things > > * adjust zone/pgdat status > * put pages into Buddy > > Generally, offline_pages do the reverse > > * take pages out of Buddy > * adjust zone/pgdat status > > While it is not that easy to take pages out of Buddy, since pages are > > * pcp list > * slub > * other usage > > This means before taking a page out of Buddy, we need to return it first > to Buddy. > > Current implementation is interesting by introducing migrate type. By > setting migrate type to MIGRATE_ISOLATE, this range of pages will never > be allocated from Buddy. And every page returned in this range will > never be touched by Buddy. > > Function start_isolate_page_range() just do this. > > Then let's focus on the pcp list. This is a little bit different > than other allocated pages. These are actually "partially" allocated > pages. They are not counted in Buddy Free pages, either no real use. So > we have two choice to get back those pages: > > * wait until it is allocated to a real user and wait for return > * or drain them directly > > Current implementation take 2nd approach. > > Then we can see there are also two way to drain them: > > * drain them range by range > * drain them in a whole range > > Both looks good, but not necessary to do them both. Because after we set > a pageblock migrate type to MIGRATE_ISOLATE, pages in this range will > never be allocated nor be put on pcp list. So after we drain one > particular range, it is not necessary to drain this range again. OK, this is an important point and actually the argument that i am wrong. I have missed that free_unref_page_commit skips pcp lists for MIGRATE_ISOLATE (resp. all migrate types above MIGRATE_PCPTYPES). Then you are right that we are OK to drain the zone only once _after_ we have isolated the full range. So please send a new patch with this clarification in the changelog and I will ack it. > The reason why I choose to drain them in a whole range is current > drain_all_pages() just carry zone information. For example, a zone may > have 1G while a pageblock is 128M. The pageblock is 1/8 of this zone. > This means in case there are 8 pages on pcp list, only 1 page drained by > drain_all_pages belongs to this pageblock. But we drain other 7 healthy > pages. > > CPU1 pcp list CPU2 pcp list > > +---------------+ +---------------+ > |A1 B3 C8 F6 | |E1 G3 D8 B6 | > +---------------+ +---------------+ > > > A B C D E F G > +---------+---------+---------+---------+---------+---------+---------+ > |012345678| | | | | | | > +---------+---------+---------+---------+---------+---------+---------+ > |<-pgblk->| > |<- Zone ->| > > > This is a chart for illustration. In case we want to isolate pgblk D, > while zone pcp list has 8 pages and only one belongs to this pgblk D. > This means the drain on pgblk base has much side effect. And with one > drain on each pgblk, this may increase the contention on this zone. > > Well, another approach is to enable drain_all_pages() with exact range > information. But neither approach needs to do them both. Is this actually worth the additional complexity? Have you seen an actual workload that would benefit from that? -- Michal Hocko SUSE Labs