On Wed, Feb 09, 2022 at 09:45:44PM -0800, Paul E. McKenney wrote: > On Thu, Feb 10, 2022 at 03:09:17PM +1100, Dave Chinner wrote: > > On Mon, Feb 07, 2022 at 08:36:21AM -0800, Paul E. McKenney wrote: > > > On Mon, Feb 07, 2022 at 08:30:03AM -0500, Brian Foster wrote: > > > Another approach is to use SLAB_TYPESAFE_BY_RCU. This allows immediate > > > reuse of freed memory, but also requires pointer traversals to the memory > > > to do a revalidation operation. (Sorry, no free lunch here!) > > > > Can't do that with inodes - newly allocated/reused inodes have to go > > through inode_init_always() which is the very function that causes > > the problems we have now with path-walk tripping over inodes in an > > intermediate re-initialised state because we recycled it inside a > > RCU grace period. > > So not just no free lunch, but this is also not a lunch that is consistent > with the code's dietary restrictions. > > From what you said earlier in this thread, I am guessing that you have > some other fix in mind. > Yeah.. I've got an experiment running that essentially tracks pending inode grace period cookies and attempts to avoid them at allocation time. It's crude atm, but the initial numbers I see aren't that far off from the results produced by your expedited grace period mechanism. I see numbers mostly in the 40-50k cycles per second ballpark. This is somewhat expected because the current baseline behavior relies on unsafe reuse of inodes before a grace period has elapsed. We have to rely on more physical allocations to get around this, so the small batch alloc/free patterns simply won't be able to spin as fast. The difference I do see with this sort of explicit gp tracking is that the results remain much closer to the baseline kernel when background activity is ramped up. However, one of the things I'd like to experiment with is whether the combination of this approach and expedited grace periods provides any sort of opportunity for further optimization. For example, if we can identify that a grace period has elapsed between the time of ->destroy_inode() and when the queue processing ultimately marks the inode reclaimable, that might allow for some optimized allocation behavior. I see this occur occasionally with normal grace periods, but not quite frequent enough to make a difference. What I observe right now is that the same test above runs at much closer to the baseline numbers when using the ikeep mount option, so I may need to look into ways to mitigate the chunk allocation overhead.. Brian > Thanx, Paul >