Re: msgr bug in master caused by recent protocol refactor (?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/10/2018 02:58, Patrick Donnelly wrote:
> In CephFS testing, we've observed transient failures caused by what
> appears to messages being dropped [1,2]. These appear to have been
> caused by the recent refactor PR [3,4] but I have no evidence other
> than the problems appearing during testing with [4] after [4] was
> merged.
> 
> I'm running tests [5] to see if I can get more debugging (debug ms =
> 20) but I wanted to canvas for ideas/advice before I get much deeper.
> Has anyone else seen transient failures with messages getting dropped?

If you successfully reproduce these issues with "debug ms = 20", I'm
mostly sure that we will be able to find the root cause.
In the meantime I'll take a look at the code to see if I find something
strange in the message dispatch code.

> 
> [1] http://tracker.ceph.com/issues/36389
> [2] http://tracker.ceph.com/issues/36349
> [3] https://github.com/ceph/ceph/pull/23415
> [4] https://github.com/ceph/ceph/pull/24305
> [5] http://pulpito.ceph.com/?branch=wip-pdonnell-testing-20181011.152759
> 

-- 
Ricardo Dias
Senior Software Engineer - Storage Team
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284
(AG Nürnberg)

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux