On Wed, Dec 11, 2019 at 05:41:16PM -0800, Linus Torvalds wrote: > I too can see xas_create using 30% CPU time, but that's when I do a > perf record on just kswapd - and when I actually look at it on a > system level, it looks nowhere near that bad. > > So I think people should look at this. Part of it might be for Willy: > does that xas_create() need to be that expensive? I hate how "perf" > callchains work, but it does look like it is probably > page_cache_delete -> xas_store -> xas_create that is the bulk of the > cost there. > > Replacing the real page with the shadow entry shouldn't be that big of > a deal, I would really hope. > > Willy, that used to be a __radix_tree_lookup -> __radix_tree_replace > thing, is there perhaps some simple optmization that could be done on > the XArray case here? It _should_ be the same order of complexity. Since there's already a page in the page cache, xas_create() just walks its way down to the right node calling xas_descend() and then xas_store() does the equivalent of __radix_tree_replace(). I don't see a bug that would make it more expensive than the old code ... a 10GB file is going to have four levels of radix tree node, so it shouldn't even spend that long looking for the right node.