Re: Assert in Pipe.cc in Hammer 0.94.7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 16, 2017 at 5:23 AM, Padmanabh Ratnakar
<padmanabh.ratnakar@xxxxxxxxxxxx> wrote:
> Hi,
>            We are seeing following assert in Pipe.cc when we hit some
> network glitch in our setup.
>
> msg/simple/Pipe.cc: In function 'int Pipe::connect()' thread
> 7f0124800700 time 2016-12-28 20:43:00.057696
> msg/simple/Pipe.cc: 1156: FAILED assert(m)
>  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0xbb1fab]
>  2: (Pipe::connect()+0x380a) [0xc8985a]
>  3: (Pipe::writer()+0x4ca) [0xc8acca]
>  4: (Pipe::Writer::entry()+0xd) [0xc95b1d]
>  5: (()+0x8182) [0x7f01e1b8f182]
>  6: (clone()+0x6d) [0x7f01e00fa47d]
>
> (gdb) bt
> #0  0x00007f01e1b9720b in raise (sig=6) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
> #1  0x0000000000ab70dd in reraise_fatal (signum=6) at
> global/signal_handler.cc:59
> #2  handle_fatal_signal (signum=6) at global/signal_handler.cc:109
> #3  <signal handler called>
> #4  0x00007f01e0036cc9 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #5  0x00007f01e003a0d8 in __GI_abort () at abort.c:89
> #6  0x00007f01e0941535 in __gnu_cxx::__verbose_terminate_handler() ()
> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #7  0x00007f01e093f6d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #8  0x00007f01e093f703 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #9  0x00007f01e093f922 in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #10 0x0000000000bb2198 in ceph::__ceph_assert_fail
> (assertion=assertion@entry=0xd8bc4f "m", file=file@entry=0xd928b8
> "msg/simple/Pipe.cc",
>     line=line@entry=1156, func=func@entry=0xd94210
> <Pipe::connect()::__PRETTY_FUNCTION__> "int Pipe::connect()") at
> common/assert.cc:77
> #11 0x0000000000c8985a in Pipe::connect (this=this@entry=0x32030000)
> at msg/simple/Pipe.cc:1156
> #12 0x0000000000c8acca in Pipe::writer (this=0x32030000) at
> msg/simple/Pipe.cc:1703
> #13 0x0000000000c95b1d in Pipe::Writer::entry (this=<optimized out>)
> at msg/simple/Pipe.h:62
> #14 0x00007f01e1b8f182 in start_thread (arg=0x7f0124800700) at
> pthread_create.c:312
> #15 0x00007f01e00fa47d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Is this a known issue? I searched for it and could not find anyone hitting this.

I don't think so.

>
> Looking at the 0.94.7 code, looks like pipe_lock is released in line
> 886 in the beginning of connect() routine.
> It is again taken later. But there is update to state member variable
> without checking current state in code below.

Where exactly?

> If pipe is moved to STATE_CLOSED in the interval when lock was released,
> there is a chance that it can get overwritten when
> CEPH_MSGR_TAG_WAIT(STATE_WAIT) comes as reply or
> directly to STATE_OPEN in line 1172.

I'm not following your referents here. The out_seq can get
overwritten? Something else?

> I feel this may cause assert seen above but only if many other things
> also happen.

We had some vaguely similar issues in the time after CEPH_MSGR_TAG_SEQ
was introduced, but I think it's been a while. You might have spotted
another rare one.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux