bug #10915 client: hangs on umount if it had an MDS session evicted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am trying to fix the bug - http://tracker.ceph.com/issues/10915.
Patrick helped me to get started with it. I was able to reproduce it
locally on vstart cluster and I am currently trying to fix it by
getting the client unmounted on eviction. Once I could do this I would
(as Patrick suggested) add a new option like
"client_unmount_on_blacklist" and modify my code accordingly.

The information about the blacklist seems to be available to the
client code [1] but, as far as I can see, that line (i.e. the if-block
containing it) never gets executed. MDS blacklists the client, evicts
the session but client fails to notice that. I suppose, this
in-congruence causes it to hang.

The reason why the client fails to notice is that it never actually
looks at the blacklist after the session is evicted --
handle_osd_map() never gets called after MDSRank::evict_session() is
called. I did write a patch that would make the client check its
address in the blacklist by calling a (new) function in
ms_handle_reset() but it did not help. Looks like not only the client
doesn't check the blacklist but also even if it were to, it would find
an outdated version.

To verify this, I wrote some debug code to iterate and display the
blacklist towards the end of and after MDSRank::evict_session(). The
blacklist turned out to be empty in both the location. Shouldn't
blacklist be updated at least in or right after
MDSRank::evict_session() gets executed? I think before fixing client,
I need to have some sort of fix somewhere here [2].

And how can I get a stacktrace for commands like "bin/ceph tell mds.a
client evict id=xxxx"?

Also I have attached the patch containing modifications I have used so far.

Thanks,
Rishabh

[1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2420
[2] https://github.com/ceph/ceph/blob/master/src/mds/MDSRank.cc#L2737
diff --git a/src/client/Client.cc b/src/client/Client.cc
index 6c464d5a36..4e1f0b442c 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -2489,7 +2489,6 @@ void Client::handle_osd_map(MOSDMap *m)
 // ------------------------
 // incoming messages
 
-
 bool Client::ms_dispatch(Message *m)
 {
   Mutex::Locker l(client_lock);
@@ -13489,9 +13488,22 @@ void Client::ms_handle_connect(Connection *con)
   ldout(cct, 10) << __func__ << " on " << con->get_peer_addr() << dendl;
 }
 
+void Client::unmount_if_blacklisted()
+{
+  std::set<entity_addr_t> new_blacklists;
+  objecter->consume_blacklist_events(&new_blacklists);
+
+  const auto myaddr = messenger->get_myaddr();
+  if (new_blacklists.count(myaddr)) {
+    cout << "UNMOUNTING!!!" << std::endl;
+    this->unmount();
+  }
+}
+
 bool Client::ms_handle_reset(Connection *con)
 {
   ldout(cct, 0) << __func__ << " on " << con->get_peer_addr() << dendl;
+  this->unmount_if_blacklisted();
   return false;
 }
 
diff --git a/src/client/Client.h b/src/client/Client.h
index ae5b188538..fd7b1f50da 100644
--- a/src/client/Client.h
+++ b/src/client/Client.h
@@ -558,6 +558,7 @@ protected:
 
   // friends
   friend class SyntheticClient;
+  void unmount_if_blacklisted();
   bool ms_dispatch(Message *m) override;
 
   void ms_handle_connect(Connection *con) override;
diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc
index d36d680d57..3ec16791e1 100644
--- a/src/mds/MDSRank.cc
+++ b/src/mds/MDSRank.cc
@@ -2717,6 +2717,9 @@ bool MDSRank::evict_client(int64_t session_id,
     bool wait, bool blacklist, std::stringstream& err_ss,
     Context *on_killed)
 {
+  FILE *fq = fopen("time", "a+");
+  fprintf(fq, "mds: MDSRank::evict_client()\n");
+
   assert(mds_lock.is_locked_by_me());
 
   // Mutually exclusive args
@@ -2823,6 +2826,17 @@ bool MDSRank::evict_client(int64_t session_id,
     }
   }
 
+  fprintf(fq, "will print the blacklist -\n");
+  std::set<entity_addr_t> blacklist2;
+  objecter->consume_blacklist_events(&blacklist2);
+  int j = 0;
+  for (std::set<entity_addr_t>::iterator i = blacklist2.begin();
+        i != blacklist2.end(); ++i, ++j) {
+    stringstream ss; ss <<  *i;
+    fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str());
+  }
+  fprintf(fq, "blacklist ends\n");
+  fclose(fq);
   return true;
 }
 
@@ -2900,6 +2914,20 @@ bool MDSRankDispatcher::handle_command(
     evict_clients(filter, m);
 
     *need_reply = false;
+  FILE *fq = fopen("time", "a+");
+  fprintf(fq, "mds: MDSRank::ms_dispatch\n");
+  fprintf(fq, "mds: will print the blacklist -\n");
+  std::set<entity_addr_t> blacklist2;
+  objecter->consume_blacklist_events(&blacklist2);
+  int j = 0;
+  for (std::set<entity_addr_t>::iterator i = blacklist2.begin();
+        i != blacklist2.end(); ++i, ++j) {
+    stringstream ss; ss <<  *i;
+    fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str());
+  }
+  fprintf(fq, "mds: blacklist ends\n");
+  fclose(fq);
+
     return true;
   } else if (prefix == "damage ls") {
     Formatter *f = new JSONFormatter(true);

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux