Re: msgr bug in master caused by recent protocol refactor (?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 15, 2018 at 6:58 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>
> In CephFS testing, we've observed transient failures caused by what
> appears to messages being dropped [1,2]. These appear to have been
> caused by the recent refactor PR [3,4] but I have no evidence other
> than the problems appearing during testing with [4] after [4] was
> merged.
>
> I'm running tests [5] to see if I can get more debugging (debug ms =
> 20) but I wanted to canvas for ideas/advice before I get much deeper.
> Has anyone else seen transient failures with messages getting dropped?

I will note that these tickets are both from after patch 1 but before
patch 2. I admit I'm not sure how the known issue with stack frames
filling up might have led to dropped messages without more obvious
failures, but maybe wait and see if they reproduce before digging into
it too hard? :)
-Greg

>
> [1] http://tracker.ceph.com/issues/36389
> [2] http://tracker.ceph.com/issues/36349
> [3] https://github.com/ceph/ceph/pull/23415
> [4] https://github.com/ceph/ceph/pull/24305
> [5] http://pulpito.ceph.com/?branch=wip-pdonnell-testing-20181011.152759
>
> --
> Patrick Donnelly



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux