Re: bug #10915 client: hangs on umount if it had an MDS session evicted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 21, 2018 at 7:09 PM, Rishabh Dave <ridave@xxxxxxxxxx> wrote:
> Hi,
>
> I am trying to fix the bug - http://tracker.ceph.com/issues/10915.
> Patrick helped me to get started with it. I was able to reproduce it
> locally on vstart cluster and I am currently trying to fix it by
> getting the client unmounted on eviction. Once I could do this I would
> (as Patrick suggested) add a new option like
> "client_unmount_on_blacklist" and modify my code accordingly.
>
> The information about the blacklist seems to be available to the
> client code [1] but, as far as I can see, that line (i.e. the if-block
> containing it) never gets executed. MDS blacklists the client, evicts
> the session but client fails to notice that. I suppose, this
> in-congruence causes it to hang.
>
> The reason why the client fails to notice is that it never actually
> looks at the blacklist after the session is evicted --
> handle_osd_map() never gets called after MDSRank::evict_session() is
> called. I did write a patch that would make the client check its
> address in the blacklist by calling a (new) function in
> ms_handle_reset() but it did not help. Looks like not only the client
> doesn't check the blacklist but also even if it were to, it would find
> an outdated version.
>
> To verify this, I wrote some debug code to iterate and display the
> blacklist towards the end of and after MDSRank::evict_session(). The
> blacklist turned out to be empty in both the location. Shouldn't
> blacklist be updated at least in or right after
> MDSRank::evict_session() gets executed? I think before fixing client,
> I need to have some sort of fix somewhere here [2].

The client only gets osdmap updates when it tries to communicate with
an OSD, and the OSD tells it that its current map epoch is too old.

In the case that the client isn't doing any data operations (i.e. no
osd ops), then the client doesn't find out that its blacklisted.  But
that's okay, because the client's awareness of its own
blacklisted-ness should only be needed in the case that there is some
dirty data that needs to be thrown away in the special if(blacklisted)
paths.

So if it's not hanging on any OSD operations (those operations would
have resulted in an updated osdmap), the question is what is it
hanging on?  Is it trying to open a new session with the MDS?

John

John

> And how can I get a stacktrace for commands like "bin/ceph tell mds.a
> client evict id=xxxx"?
>
> Also I have attached the patch containing modifications I have used so far.
>
> Thanks,
> Rishabh
>
> [1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2420
> [2] https://github.com/ceph/ceph/blob/master/src/mds/MDSRank.cc#L2737
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux