Re: msgr bug in master caused by recent protocol refactor (?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 17, 2018 at 5:01 AM Ricardo Dias <rdias@xxxxxxxx> wrote:
> From the above history, the strange thing that I see is that the
> EventManager didn't call the handle_write on the messenger connection
> since 19:57:45 until the connection is stopped by the MonClient.
> This is the cause for the keepalives and mdsbeacon messages to not be
> send to the monitor.
> But I don't quite understand why this happens. Maybe the EventManager is
> too busy handling other events?

It's possible. It may also be relevant that the MDS is running in
valgrind. I'm goign to see if we have another instance of this in
testing without valgrind.

> Also, let's imagine that the EventManager is thrashing and really takes
> that long to issue a handle_write in the connection. Shouldn't the MDS
> be aware that the MonClient might restart the connection to the monitor
> due to not receiving keepalives ack, and take care of that situation?

The MonClient keepalives predate the MDS's MonClient restarts. We just
haven't gotten around to getting rid of that:
http://tracker.ceph.com/issues/36493

> In the meantime, I'm going to look at the code of the EventManager
> (which wasn't changed by the messenger refactorings) to understand why
> the above situation happened.

Thanks Ricardo for the analysis and for looking into this further.

-- 
Patrick Donnelly



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux