On Mon, Feb 12, 2018 at 6:06 PM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: >> I've been following this discussion casually and am a bit confused. >> The Client will happily send off an explicit getattr request if it >> doesn't have enough capabilities to answer it locally. >> >> Is the problem here that the MDS is not answering all pending >> CEPH_MDS_OP_GETATTR requests in one go? (Which I suppose it doesn't >> really have a way of doing, if they haven't all been processed into >> interior pending locks — but I think they should have gotten there if >> caps are being recalled?) >> Or are the clients for some reason requesting capabilities instead of >> the single getattr message? >> -Greg > > Hi, Greg. > > In our case, the mds does answer each CEPH_MDS_OP_GETATTR request > seperately, even when there caps are recalled. According to our mds > log, when the caps are recalled, all of these CEPH_MDS_OP_GETATTR > requests are added to the waiter queue, and when the filelock goes > into a stable state, they would be dispatched to be reprocessed one by > one. However, as there are writing clients that want "Fw", the very > first CEPH_MDS_OP_GETATTR request of those in the waiter queue would, > again, turn the filelock into LOCK_SYNC_MIX state which would blocked > all remaining CEPH_MDS_OP_GETATTR requests to get processed. Hmm, that makes some sense but is sad. I think to resolve this we’d need the MDS to recognize “repetitions” of the same op type that can be serviced in a single lock operation for essentially no extra (locking) cost, but I’m not sure how we’d integrate that with the capability locking going on. I guess that when we do locking operations, if they are a single request and don't involve giving the client caps, maybe we could stick stick any requests that require the same set of locks on a shared data structure? And then run through them all when we get granted locks and reply to those requests? That may not involve any real fairness tradeoffs, as long as we're careful to only do stuff that doesn't require extra effort (beyond queueing up the message send). But I haven't looked at the data structures enough lately to have any idea if something like this is really feasible. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html