Hi, I am trying to fix the bug - http://tracker.ceph.com/issues/10915. Patrick helped me to get started with it. I was able to reproduce it locally on vstart cluster and I am currently trying to fix it by getting the client unmounted on eviction. Once I could do this I would (as Patrick suggested) add a new option like "client_unmount_on_blacklist" and modify my code accordingly. The information about the blacklist seems to be available to the client code [1] but, as far as I can see, that line (i.e. the if-block containing it) never gets executed. MDS blacklists the client, evicts the session but client fails to notice that. I suppose, this in-congruence causes it to hang. The reason why the client fails to notice is that it never actually looks at the blacklist after the session is evicted -- handle_osd_map() never gets called after MDSRank::evict_session() is called. I did write a patch that would make the client check its address in the blacklist by calling a (new) function in ms_handle_reset() but it did not help. Looks like not only the client doesn't check the blacklist but also even if it were to, it would find an outdated version. To verify this, I wrote some debug code to iterate and display the blacklist towards the end of and after MDSRank::evict_session(). The blacklist turned out to be empty in both the location. Shouldn't blacklist be updated at least in or right after MDSRank::evict_session() gets executed? I think before fixing client, I need to have some sort of fix somewhere here [2]. And how can I get a stacktrace for commands like "bin/ceph tell mds.a client evict id=xxxx"? Also I have attached the patch containing modifications I have used so far. Thanks, Rishabh [1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2420 [2] https://github.com/ceph/ceph/blob/master/src/mds/MDSRank.cc#L2737
diff --git a/src/client/Client.cc b/src/client/Client.cc index 6c464d5a36..4e1f0b442c 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -2489,7 +2489,6 @@ void Client::handle_osd_map(MOSDMap *m) // ------------------------ // incoming messages - bool Client::ms_dispatch(Message *m) { Mutex::Locker l(client_lock); @@ -13489,9 +13488,22 @@ void Client::ms_handle_connect(Connection *con) ldout(cct, 10) << __func__ << " on " << con->get_peer_addr() << dendl; } +void Client::unmount_if_blacklisted() +{ + std::set<entity_addr_t> new_blacklists; + objecter->consume_blacklist_events(&new_blacklists); + + const auto myaddr = messenger->get_myaddr(); + if (new_blacklists.count(myaddr)) { + cout << "UNMOUNTING!!!" << std::endl; + this->unmount(); + } +} + bool Client::ms_handle_reset(Connection *con) { ldout(cct, 0) << __func__ << " on " << con->get_peer_addr() << dendl; + this->unmount_if_blacklisted(); return false; } diff --git a/src/client/Client.h b/src/client/Client.h index ae5b188538..fd7b1f50da 100644 --- a/src/client/Client.h +++ b/src/client/Client.h @@ -558,6 +558,7 @@ protected: // friends friend class SyntheticClient; + void unmount_if_blacklisted(); bool ms_dispatch(Message *m) override; void ms_handle_connect(Connection *con) override; diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc index d36d680d57..3ec16791e1 100644 --- a/src/mds/MDSRank.cc +++ b/src/mds/MDSRank.cc @@ -2717,6 +2717,9 @@ bool MDSRank::evict_client(int64_t session_id, bool wait, bool blacklist, std::stringstream& err_ss, Context *on_killed) { + FILE *fq = fopen("time", "a+"); + fprintf(fq, "mds: MDSRank::evict_client()\n"); + assert(mds_lock.is_locked_by_me()); // Mutually exclusive args @@ -2823,6 +2826,17 @@ bool MDSRank::evict_client(int64_t session_id, } } + fprintf(fq, "will print the blacklist -\n"); + std::set<entity_addr_t> blacklist2; + objecter->consume_blacklist_events(&blacklist2); + int j = 0; + for (std::set<entity_addr_t>::iterator i = blacklist2.begin(); + i != blacklist2.end(); ++i, ++j) { + stringstream ss; ss << *i; + fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str()); + } + fprintf(fq, "blacklist ends\n"); + fclose(fq); return true; } @@ -2900,6 +2914,20 @@ bool MDSRankDispatcher::handle_command( evict_clients(filter, m); *need_reply = false; + FILE *fq = fopen("time", "a+"); + fprintf(fq, "mds: MDSRank::ms_dispatch\n"); + fprintf(fq, "mds: will print the blacklist -\n"); + std::set<entity_addr_t> blacklist2; + objecter->consume_blacklist_events(&blacklist2); + int j = 0; + for (std::set<entity_addr_t>::iterator i = blacklist2.begin(); + i != blacklist2.end(); ++i, ++j) { + stringstream ss; ss << *i; + fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str()); + } + fprintf(fq, "mds: blacklist ends\n"); + fclose(fq); + return true; } else if (prefix == "damage ls") { Formatter *f = new JSONFormatter(true);