On Thu, Jul 09, 2020 at 06:36:21PM +0800, Gao Xiang wrote: > Hi Dave, > > On Thu, Jul 09, 2020 at 12:32:46PM +1000, Dave Chinner wrote: > > ... > > > > > if (trylock AGI) > > > > return; > > > > > > (adding some notes here, this patch doesn't use try lock here > > > finally but unlock perag and take AGI and relock and recheck tail_empty.... > > > since the tail non-empty is rare...) > > > > *nod* > > > > My point was largely that this sort of thing is really obvious and > > easy to optimise once the locking is cleanly separated. Adding a > > trylock rather than drop/relock is another patch for the series :P > > I thought trylock way yesterday as well... > Apart from that we don't have the AGI trylock method yet, it seems > not work as well... > > Thinking about one thread is holding a AGI lock and wants to lock perag. > And another thread holds a perag, but the trylock will fail and the second > wait-lock will cause deadlock... like below: > > Thread 1 Thread 2 > lock AGI > lock perag > lock perag > trylock AGI > lock AGI <- dead lock here You forgot to unlock the perag before locking the AGI, so of course it will deadlock. Yes, I know my psuedocode example didn't explicitly unlock the perag because I simply forgot to write it in as I was going. It's pretty common that quickly written psuedo code to explain a concept is not perfect, so you still need to think about whether it is correct, complete, etc. i.e. if you do: Thread 1 Thread 2 lock AGI lock perag lock perag trylock AGI unlock perag lock AGI <blocks> There is no deadlock in this algorithm. > And trylock perag instead seems not work well as well, since it fails > more frequently so we need lock AGI and then lock perag as a fallback. That's kind of what I'd expect - if there is contention on the AGI, the trylock will fail and we know we are about to sleep. The point is that "trylock fails" now gives us a point at which we can measure contention that puts us into the slow path. And because we are now in the slow path, the overhead of backing out just doesn't matter because we are going to sleep anyway. We use this pattern all over XFS to attempt to get locks in reverse order with minimal overhead in fast paths... > So currently, I think it's better to use unlock, do something, regrab, > recheck method for now... I think that's a mistake - it's smells of trying to optimise the implementation around immediately observed behaviour instead of designing a clean, well structured locking scheme that will not need to be re-optimised over and over again as the algorithm it protects changes over time. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx