Re: [PATCH 0/3] mdsmap: fix mds choosing

Xiubo Li <xiubli@xxxxxxxxxx> · Thu, 21 Nov 2019 19:28:02 +0800

On 2019/11/21 10:42, Yan, Zheng wrote:
On 11/20/19 9:50 PM, Jeff Layton wrote:
On Wed, 2019-11-20 at 03:28 -0500, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

Xiubo Li (3):
   mdsmap: add more debug info when decoding
   mdsmap: fix mdsmap cluster available check based on laggy number
   mdsmap: only choose one MDS who is in up:active state without laggy

  fs/ceph/mds_client.c |  6 ++++--
  fs/ceph/mdsmap.c     | 27 ++++++++++++++++++---------
  2 files changed, 22 insertions(+), 11 deletions(-)

These all look good to me. I'll plan to merge them for v5.5, unless
anyone else sees issues with them.

Thanks!

Main problem of this series is that we need to distinguish between mds 
crash and transient mds laggy.

How about let's try to check and get an up:active & !laggy mds first, if 
we couldn't find one then fall back to one that is up:active & laggy ?

For the auth mds case, we will ignore the laggy stuff.

BRs