On Fri, Mar 13, 2020 at 06:54:00AM -0700, Paul E. McKenney wrote: > I would guess that sorting them before the grace period might improve > cache locality and thus performance. So it does seem like an excellent > thing to try, at the very least as an experiment. That doesn't seem at all obvious. Processing them in separate batches would improve I-cache locality, but you could sort them after the grace period just as well as before. Especially if you have arrays of 500 pointers to work with. Indeed, one thing that seems worth trying is sorting by address, which would improve D-cache locality, since you have a significant chance for consecutive frees to be in the same slab or otherwise reference the same overhead data structures. Sorting by (address - VMALLOC_START) automatically groups the vallocated poiners together at the front, too. Since there's no vfree_bulk, you can iterate over them until you run out, then kfree_bulk the rest. (This idea came from a memory that bulk file operations can be made faster by sorting by inode number.) P.S. if you want to fit one extra pointer in the array, an array index identifying the first unused slot is distinguishable from a pointer, so if the last slot is a pointer, the page is full. If it's an index, the page is not full.