Re: Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

Mark Nelson <mnelson@xxxxxxxxxx> · Thu, 8 Jun 2017 07:16:48 -0500

Hi Jayaram,

Thanks for creating a tracker entry! Any chance you could add a note 
about how you are generating the 200MB/s client workload?  I've not seen 
this problem in the lab, but any details you could give that would help 
us reproduce the problem would be much appreciated!
Mark

On 06/08/2017 06:08 AM, nokia ceph wrote:
Hello Mark,

Raised tracker for the issue  -- http://tracker.ceph.com/issues/20222

Jake can you share the restart_OSD_and_log-this.sh script

Thanks
Jayaram

On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx
<mailto:jog@xxxxxxxxxxxxxxxxx>> wrote:

    Hi Mark & List,

    Unfortunately, even when using yesterdays master version of ceph,
    I'm still seeing OSDs go down, same error as before:

    OSD log shows lots of entries like this:

    (osd38)
    2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
    'tp_osd_tp thread tp_osd_tp' had timed out after 60

    (osd3)
    2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
    'tp_osd_tp thread tp_osd_tp' had timed out after 60
    2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165 heartbeat_check:
    no reply from 10.1.0.86:6811 <http://10.1.0.86:6811> osd.2 since
    back 2017-06-07 17:00:19.640002
    front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)

    [root@ceph4 ceph]# ceph -v
    ceph version 12.0.2-2399-ge38ca14
    (e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)

    I'll continue running the cluster with my "restart_OSD_and_log-this.sh"
    workaround...

    thanks again for your help,

    Jake

    On 06/06/17 15:52, Jake Grimmett wrote:
    > Hi Mark,
    >
    > OK, I'll upgrade to the current master and retest...
    >
    > best,
    >
    > Jake
    >
    > On 06/06/17 15:46, Mark Nelson wrote:
    >> Hi Jake,
    >>
    >> I just happened to notice this was on 12.0.3.  Would it be
    possible to
    >> test this out with current master and see if it still is a problem?
    >>
    >> Mark
    >>
    >> On 06/06/2017 09:10 AM, Mark Nelson wrote:
    >>> Hi Jake,
    >>>
    >>> Thanks much.  I'm guessing at this point this is probably a
    bug.  Would
    >>> you (or nokiauser) mind creating a bug in the tracker with a short
    >>> description of what's going on and the collectl sample showing
    this is
    >>> not IOs backing up on the disk?
    >>>
    >>> If you want to try it, we have a gdb based wallclock profiler
    that might
    >>> be interesting to run while it's in the process of timing out.
    It tries
    >>> to grab 2000 samples from the osd process which typically takes
    about 10
    >>> minutes or so.  You'll need to either change the number of
    samples to be
    >>> lower in the python code (maybe like 50-100), or change the
    timeout to
    >>> be something longer.
    >>>
    >>> You can find the code here:
    >>>
    >>> https://github.com/markhpc/gdbprof
    <https://github.com/markhpc/gdbprof>
    >>>
    >>> and invoke it like:
    >>>
    >>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
    >>> ./gdbprof.py' -ex 'profile begin' -ex 'quit'
    >>>
    >>> where 27962 in this case is the PID of the ceph-osd process.  You'll
    >>> need gdb with the python bindings and the ceph debug symbols for
    it to
    >>> work.
    >>>
    >>> This might tell us over time if the tp_osd_tp processes are just
    sitting
    >>> on pg::locks.
    >>>
    >>> Mark
    >>>
    >>> On 06/06/2017 05:34 AM, Jake Grimmett wrote:
    >>>> Hi Mark,
    >>>>
    >>>> Thanks again for looking into this problem.
    >>>>
    >>>> I ran the cluster overnight, with a script checking for dead
    OSDs every
    >>>> second, and restarting them.
    >>>>
    >>>> 40 OSD failures occurred in 12 hours, some OSDs failed multiple
    times,
    >>>> (there are 50 OSDs in the EC tier).
    >>>>
    >>>> Unfortunately, the output of collectl doesn't appear to show any
    >>>> increase in disk queue depth and service times before the OSDs die.
    >>>>
    >>>> I've put a couple of examples of collectl output for the disks
    >>>> associated with the OSDs here:
    >>>>
    >>>> https://hastebin.com/icuvotemot.scala
    <https://hastebin.com/icuvotemot.scala>
    >>>>
    >>>> please let me know if you need more info...
    >>>>
    >>>> best regards,
    >>>>
    >>>> Jake
    >>>>
    >>>>
    >
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com