Re: Client still connect failed leader after that mon down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/12/15 21:27, Sage Weil wrote:
On Thu, 17 Dec 2015, Jaze Lee wrote:
Hello cephers:
     In our test, there are three monitors. We find client run ceph
command will slow when the leader mon is down. Even after long time, a
client run ceph command will also slow in first time.
>From strace, we find that the client first to connect the leader, then
after 3s, it connect the second.
After some search we find that the quorum is not change, the leader is
still the down monitor.
Is that normal?  Or is there something i miss?
It's normal.  Even when the quorum does change, the client doesn't
know that.  It should be contacting a random mon on startup, though, so I
would expect the 3s delay 1/3 of the time.
That's because client randomly picks up a mon from Monmap. But what we observed is that when a mon is down no change is made to monmap(neither the epoch nor the members). Is it the culprit for this phenomenon?

Thanks,
Jevon
A long-standing low-priority feature request is to have the client contact
2 mons in parallel so that it can still connect quickly if one is down.
It's requires some non-trivial work in mon/MonClient.{cc,h} though and I
don't think anyone has looked at it seriously.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux