在 2020/8/16 下午11:56, Alexander Duyck 写道: > On Sun, Aug 16, 2020 at 7:11 AM Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> wrote: >> >> >> >> 在 2020/8/16 下午12:17, Matthew Wilcox 写道: >>> On Sun, Aug 16, 2020 at 11:47:57AM +0800, Alex Shi wrote: >>>> Current pageblock_flags is only 4 bits, so it has to share a char size >>>> in cmpxchg when get set, the false sharing cause perf drop. >>>> >>>> If we incrase the bits up to 8, false sharing would gone in cmpxchg. and >>>> the only cost is half char per pageblock, which is half char per 128MB >>>> on x86, 4 chars in 1 GB. >>> >>> I don't believe this patch has that effect, mostly because it still does >>> cmpxchg() on words instead of bytes. >> >> Hi Matthew, >> >> Thank a lot for comments! >> >> Sorry, I must overlook sth, would you like point out why the cmpxchg is still >> on words after patch 1 applied? >> > > I would take it one step further. You still have false sharing as the > pageblocks bits still occupy the same cacheline so you are going to > see them cache bouncing regardless. Right, there 2 level false sharing here, cacheline and cmpxchg comparsion range. this patch could fix the cmpxchg level with a very cheap price. the cacheline size is too huge to resovle here. > > What it seems like you are attempting to address is the fact that > multiple threads could all be attempting to update the same long > value. As I pointed out for the migrate type it seems to be protected > by the zone lock, but for compaction the skip bit doesn't have the > same protection as there are some threads using the zone lock and > others using the LRU lock. I'm still not sure it makes much of a > difference though. It looks with this patch, lock are not needed anymore on the flags. > >>> >>> But which functions would benefit? It seems to me this cmpxchg() is >>> only called from the set_pageblock_migratetype() morass of functions, >>> none of which are called in hot paths as far as I can make out. >>> >>> So are you just reasoning by analogy with the previous patch where you >>> have measured a performance improvement, or did you send the wrong patch, >>> or did I overlook a hot path that calls one of the pageblock migration >>> functions? >>> >> >> Uh, I am reading compaction.c and found the following commit introduced >> test_and_set_skip under a lock. It looks like the pagelock_flags setting >> has false sharing in cmpxchg. but I have no valid data on this yet. >> >> Thanks >> Alex >> >> e380bebe4771548 mm, compaction: keep migration source private to a single compaction instance >> >> if (!locked) { >> locked = compact_trylock_irqsave(zone_lru_lock(zone), >> &flags, cc); >> - if (!locked) >> + >> + /* Allow future scanning if the lock is contended */ >> + if (!locked) { >> + clear_pageblock_skip(page); >> break; >> + } >> + >> + /* Try get exclusive access under lock */ >> + if (!skip_updated) { >> + skip_updated = true; >> + if (test_and_set_skip(cc, page, low_pfn)) >> + goto isolate_abort; >> + } >> > > I'm not sure that is a good grounds for doubling the size of the > pageblock flags. If you look further down in the code there are bits > that are setting these bits without taking the lock. The assumption > here is that by taking the lock the test_and_set_skip will be > performed atomically since another thread cannot perform that while > the zone lock is held. If you look in the function itself it only does > anything if the skip bits are checked and if the page is the first > page in the pageblock. > > I think you might be confusing some of my earlier comments. I still > believe the 3% regression you reported with my patch is not directly > related to the test_and_set_skip as the test you ran seems unlikely to > trigger compaction. However with that said one of the advantages of > using the locked section to perform these types of tests is that it > reduces the number of times the test is run since it will only be on > the first unlocked page in any batch of pages and the first page in > the pageblock is always going to be handled without the lock held > since it is the first page processed. > > Until we can get a test up such as thpscale that does a good job of > stressing the compaction code I don't think we can rely on just > observations to say if this is an improvement or not. I still struggle on thpscale meaningful running. But if the patch is clearly right in theory. Do we have to hang on a benchmark result? Thanks Alex