On 22 March 2018 at 17:40, John Spray <jspray@xxxxxxxxxx> wrote: > The client only gets osdmap updates when it tries to communicate with > an OSD, and the OSD tells it that its current map epoch is too old. > > In the case that the client isn't doing any data operations (i.e. no > osd ops), then the client doesn't find out that its blacklisted. But > that's okay, because the client's awareness of its own > blacklisted-ness should only be needed in the case that there is some > dirty data that needs to be thrown away in the special if(blacklisted) > paths. > > So if it's not hanging on any OSD operations (those operations would > have resulted in an updated osdmap), the question is what is it > hanging on? Is it trying to open a new session with the MDS? Looks like client still "thinks" that it has a session open (since condition at [1] was false when I checked it myself) and then it waits for a reply [2]. This is exactly where it hangs. I have written a fix and raised a PR for it [3]. Basically, it replaces caller_cond.Wait(client_lock) by caller_cond.WaitInterval(client_lock, utime_t(10, 0)). By the way, would it be correct for client to realize that it is blacklisted here [4]? When I checked, it wasn't so -- the (copy of) blacklist didn't have the client address. Also, I think it would better if an evicted/blacklisted client receives some sort of reply on making a request (like here [2] in this bug's case) from MDS that would convey that it can't access CephFS anymore. I don't know if this would be appropriate to do so, but this would not make the client wait infinitely. Though, this might be risky considering that client can, then, flood the MDS with requests. In that case, maybe MDS should send a reply to client only once to avoid that. [1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1689 [2] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1719 [3] https://github.com/ceph/ceph/pull/21065 [4] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1658 [5] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2410 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html