Hi, On Fri, 2021-08-20 at 15:17 +0200, Andreas Gruenbacher wrote: > On Fri, Aug 20, 2021 at 11:35 AM Steven Whitehouse < > swhiteho@xxxxxxxxxx> wrote: > > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: > > > From: Bob Peterson <rpeterso@xxxxxxxxxx> > > > > > > This patch introduces a new HIF_MAY_DEMOTE flag and > > > infrastructure > > > that will allow glocks to be demoted automatically on locking > > > conflicts. > > > When a locking request comes in that isn't compatible with the > > > locking > > > state of a holder and that holder has the HIF_MAY_DEMOTE flag > > > set, the > > > holder will be demoted automatically before the incoming locking > > > request > > > is granted. > > > > I'm not sure I understand what is going on here. When there are > > locking > > conflicts we generate call backs and those result in glock > > demotion. > > There is no need for a flag to indicate that I think, since it is > > the > > default behaviour anyway. Or perhaps the explanation is just a bit > > confusing... > > When a glock has active holders (with the HIF_HOLDER flag set), the > glock won't be demoted to a state incompatible with any of those > holders. > Ok, that is a much clearer explanation of what the patch does. Active holders have always prevented demotions previously. > > > Processes that allow a glock holder to be taken away indicate > > > this by > > > calling gfs2_holder_allow_demote(). When they need the glock > > > again, > > > they call gfs2_holder_disallow_demote() and then they check if > > > the > > > holder is still queued: if it is, they're still holding the > > > glock; if > > > it isn't, they need to re-acquire the glock. > > > > > > This allows processes to hang on to locks that could become part > > > of a > > > cyclic locking dependency. The locks will be given up when a > > > (rare) > > > conflicting locking request occurs, and don't need to be given up > > > prematurely. > > > > This seems backwards to me. We already have the glock layer cache > > the > > locks until they are required by another node. We also have the min > > hold time to make sure that we don't bounce locks too much. So what > > is > > the problem that you are trying to solve here I wonder? > > This solves the problem of faulting in pages during read and write > operations: on the one hand, we want to hold the inode glock across > those operations. On the other hand, those operations may fault in > pages, which may require taking the same or other inode glocks, > directly or indirectly, which can deadlock. > > So before we fault in pages, we indicate with > gfs2_holder_allow_demote(gh) that we can cope if the glock is taken > away from us. After faulting in the pages, we indicate with > gfs2_holder_disallow_demote(gh) that we now actually need the glock > again. At that point, we either still have the glock (i.e., the > holder > is still queued and it has the HIF_HOLDER flag set), or we don't. > > The different kinds of read and write operations differ in how they > handle the latter case: > > * When a buffered read or write loses the inode glock, it returns a > short result. This > prevents torn writes and reading things that have never existed on > disk in that form. > > * When a direct read or write loses the inode glock, it re-acquires > it before resuming > the operation. Direct I/O is not expected to return partial > results > and doesn't provide > any kind of synchronization among processes. > > We could solve this kind of problem in other ways, for example, by > keeping a glock generation number, dropping the glock before faulting > in pages, re-acquiring it afterwards, and checking if the generation > number has changed. This would still be an additional piece of glock > infrastructure, but more heavyweight than the HIF_MAY_DEMOTE flag > which uses the existing glock holder infrastructure. This is working towards the "why" but could probably be summarised a bit more. We always used to manage to avoid holding fs locks when copying to/from userspace to avoid these complications. If that is no longer possible then it would be good to document what the new expectations are somewhere suitable in Documentation/filesystems/... just so we make sure it is clear what the new system is, and everyone will be on the same page, Steve.