On Fri, Mar 13, 2020 at 04:52:19PM +0000, George Spelvin wrote: > On Fri, Mar 13, 2020 at 06:54:00AM -0700, Paul E. McKenney wrote: > > I would guess that sorting them before the grace period might improve > > cache locality and thus performance. So it does seem like an excellent > > thing to try, at the very least as an experiment. > > That doesn't seem at all obvious. Processing them in separate batches > would improve I-cache locality, but you could sort them after the > grace period just as well as before. Especially if you have arrays of > 500 pointers to work with. > > Indeed, one thing that seems worth trying is sorting by address, which > would improve D-cache locality, since you have a significant chance for > consecutive frees to be in the same slab or otherwise reference the same > overhead data structures. > > Sorting by (address - VMALLOC_START) automatically groups the vallocated > poiners together at the front, too. Since there's no vfree_bulk, you can > iterate over them until you run out, then kfree_bulk the rest. > > (This idea came from a memory that bulk file operations can be > made faster by sorting by inode number.) > > P.S. if you want to fit one extra pointer in the array, an array index > identifying the first unused slot is distinguishable from a pointer, > so if the last slot is a pointer, the page is full. If it's an index, > the page is not full. Another approach would be to terminate with a NULL pointer, or with the end of the array, as the case may be. Thanx, Paul