On Thu 08-08-19 11:26:38, Wei Yang wrote: > On Wed, Aug 07, 2019 at 09:51:01AM +0200, Michal Hocko wrote: > >On Wed 07-08-19 08:31:09, Wei Yang wrote: > >> On Tue, Aug 06, 2019 at 11:29:52AM +0200, Vlastimil Babka wrote: > >> >On 8/6/19 10:11 AM, Wei Yang wrote: > >> >> When addr is out of the range of the whole rb_tree, pprev will points to > >> >> the biggest node. find_vma_prev gets is by going through the right most > >> > > >> >s/biggest/last/ ? or right-most? > >> > > >> >> node of the tree. > >> >> > >> >> Since only the last node is the one it is looking for, it is not > >> >> necessary to assign pprev to those middle stage nodes. By assigning > >> >> pprev to the last node directly, it tries to improve the function > >> >> locality a little. > >> > > >> >In the end, it will always write to the cacheline of pprev. The caller has most > >> >likely have it on stack, so it's already hot, and there's no other CPU stealing > >> >it. So I don't understand where the improved locality comes from. The compiler > >> >can also optimize the patched code so the assembly is identical to the previous > >> >code, or vice versa. Did you check for differences? > >> > >> Vlastimil > >> > >> Thanks for your comment. > >> > >> I believe you get a point. I may not use the word locality. This patch tries > >> to reduce some unnecessary assignment of pprev. > >> > >> Original code would assign the value on each node during iteration, this is > >> what I want to reduce. > > > >Is there any measurable difference (on micro benchmarks or regular > >workloads)? > > I wrote a test case to compare these two methods, but not find visible > difference in run time. What is the point in changing this code if it doesn't lead to any measurable improvement? -- Michal Hocko SUSE Labs