On Thu, Apr 11, 2019 at 8:16 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Wed, Apr 10, 2019 at 3:45 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Wed, Apr 10, 2019 at 5:54 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > > > > > On Wed, Apr 10, 2019 at 2:11 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > On Wed, Apr 10, 2019 at 4:21 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > > > > > > > > > > On Wed, Apr 10, 2019 at 7:21 AM Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > holding caps for request may cause deadlock. For example > > > > > > > > > > > > > > > > > > - client hold Fx caps and send unlink request > > > > > > > > > - mds process request from other client, it change filelock's state to > > > > > > > > > EXCL_FOO and revoke Fx caps > > > > > > > > > - mds receives the unlink request, it can't process it because it > > > > > > > > > can't acquire wrlock on filelock > > > > > > > > > > > > > > > > > > filelock state stays in EXCL_FOO because client does not release Fx caps. > > > > > > > > > > > > > > > > > > > > > > > > > The client doing the unlink may have received a revoke for Fx on the > > > > > > > > dir at that point, but it won't have returned it yet. Shouldn't it > > > > > > > > still be considered to hold Fx on the dir until that happens? > > > > > > > > > > > > > > > > > > > > > > Client should release the Fx. But there is a problem, mds process > > > > > > > other request first after it get the release of Fx > > > > > > > > > > > > > > > > > > > As I envisioned it, the client would hold a reference to Fx while the > > > > > > unlink is in flight, so it would not return Fx until after the unlink > > > > > > has gotten an unsafe reply. > > > > > > > > > > This was my understanding as well. It seems to me that the correct > > > > > thing to do is to move forward with the understanding that the client > > > > > has a write lock on the filelock state for the directory inode (for Fx > > > > > cap) and a write lock on the linklock for the file inode (for the Lx > > > > > cap). Obtaining those locks should require cap revocation which would > > > > > cause the client to flush its buffered async unlinks. Importantly -- > > > > > and what actually needs to change (?): the MDS should skip acquiring > > > > > those locks because the client already has the appropriate caps. > > > > > > > > > > Does that work Zheng? > > > > > > > > > > > > > I'm not sure it will. IIUC... > > > > > > > > I think part of what Zheng is pointing out is that when we assume that > > > > the client already holds certain locks, then we are effectively > > > > changing the order in which they can be acquired. That can leave us > > > > subject to ABBA style deadlocks (though with all of the complexity > > > > that class Locker provides). > > > > > > > > That in and of itself wouldn't be a problem if the MDS code didn't > > > > wait synchronously on cap revokes in some cases (which Zheng pointed > > > > out). Fixing that latter bit seems like it might be a big win for > > > > parallelism, in addition to making async calls more possible. > > > > > > Well it's not literally synchronous; it's just that the MDS holds on > > > to the locks it's already taken. That's why you can see failure cases > > > where the MDS is still running and resolving requests but there's a > > > particular client which is stuck with one operation that never moves > > > forward. > > > > > > > Got it, thanks. > > > > Could we resolve this by just unwinding the held locks in that case > > before requeueing the request? Then just reacquire them when we > > reattempt it. We might need a livelock avoidance mechanism but that > > doesn't sound too conceptually difficult. > > I'm hesitant to speak too authoritatively after this long and missing > that it would apparently be trivially subject to ABBA, but the things > that concern me about it are: > 1) the freezing process for snapshots and exports rely on waiting > until all acquired locks have been dropped. We could try and reference > count "requests-that-will-ask-for-these-again" but it gets difficult > 2) Livelock avoidance mechanisms are always difficult? > 3) The locking and caps infrastructures are supposed to let us handle > this. I grant you they're annoyingly difficult and undocumented; that > needs to be fixed (and much of it CAN be!). I don't think designing > new systems to avoid interacting with them is the right approach to > take. > > I think I mentioned before that we really need to draw out the caps > needed and how they interact. Why does it need more than the FxLx as > we've discussed, Zheng? Is it unlikely the client can hold the > necessary pieces upfront? For the unlink case, mds also needs to wrlock nestlock and rdlock snaplock. These locks are not covered by cap mechanism. > -Greg