ceph 0.78 mon and mds crashing (bus error)

Kenneth.Waegeman at UGent.be (Kenneth Waegeman) · Wed, 02 Apr 2014 12:39:23 +0200

----- Message from Gregory Farnum <greg at inktank.com> ---------
    Date: Tue, 1 Apr 2014 09:03:17 -0700
    From: Gregory Farnum <greg at inktank.com>
Subject: Re: ceph 0.78 mon and mds crashing (bus error)
      To: "Yan, Zheng" <ukernel at gmail.com>
      Cc: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, ceph-users  
<ceph-users at lists.ceph.com>

> On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <ukernel at gmail.com> wrote:
>> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
>> <Kenneth.Waegeman at ugent.be> wrote:
>>> After some more searching, I've found that the source of the  
>>> problem is with
>>> the mds and not the mon.. The mds crashes, generates a core dump that eats
>>> the local space, and in turn the monitor (because of leveldb) crashes.
>>>
>>> The error in the mds log of one host:
>>>
>>> 2014-04-01 15:46:34.414615 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
>>> 10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 cs=0 l=1
>>> c=0x147ac780).connect got RESETSESSION but no longer connecting
>>> 2014-04-01 15:46:34.438792 7f871194f700  0 -- 10.141.8.180:6836/13152 >>
>>> 10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0 l=1
>>> c=0x1fd44360).connect got RESETSESSION but no longer connecting
>>> 2014-04-01 15:46:34.439028 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
>>> 10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 cs=0 l=1
>>> c=0x1fd43de0).connect got RESETSESSION but no longer connecting
>>> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function 'void
>>> CDir::_omap_fetched(ceph::bufferlist&, std::map<std::basic_string<char,
>>> std::char_traits<char>, std::allocator<char> >, ceph::buffer::list,
>>> std::less<std::basic_string<char, std::char_traits<char>,
>>> std::allocator<char> > >, std::allocator<std::pair<const
>>> std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
>>> ceph::buffer::list> > >&, const std::string&, int)' thread  
>>> 7f871b7ae700 time
>>> 2014-04-01 15:46:34.448320
>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)
>>>
>>
>> could you use gdb to check what is value of variable 'r' .
>
> If you look at the crash dump log you can see the return value in the
> osd_op_reply message:
> -1> 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 ====
> osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header
> 0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after
> transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0)
> 0x2cff7aa80 con 0x37ea3c0
>
> -108, which is ESHUTDOWN, but we also use it (via the 108 constant, I
> think because ESHUTDOWN varies across platforms) as EBLACKLISTED.
> So it looks like this is itself actually a symptom of another problem
> that is causing the MDS to get timed out on the monitor. If a core
> dump is "eating the local space", maybe the MDS is stuck in an
> infinite allocation loop of some kind? How big are your disks,
> Kenneth? Do you have any information on how much CPU/memory the MDS
> was using before this?

I monitored the mds process after restart:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19215 root      20   0 6070m 5.7g 5236 S 778.6 18.1   1:27.54 ceph-mds
19215 root      20   0 7926m 7.5g 5236 S 179.2 23.8   2:44.39 ceph-mds
19215 root      20   0 12.4g  12g 5236 S 157.2 38.8   3:43.47 ceph-mds
19215 root      20   0 16.6g  16g 5236 S 144.4 52.0   4:15.01 ceph-mds
19215 root      20   0 19.9g  19g 5236 S 137.2 62.5   4:35.83 ceph-mds
19215 root      20   0 24.5g  24g 5224 S 136.5 77.0   5:04.66 ceph-mds
19215 root      20   0 25.8g  25g 2944 S 33.7 81.2   5:13.74 ceph-mds
19215 root      20   0 26.0g  25g 2916 S 24.6 81.7   5:19.07 ceph-mds
19215 root      20   0 26.1g  25g 2916 S 13.0 82.1   5:22.16 ceph-mds
19215 root      20   0 27.7g  26g 1856 S 100.0 85.8   5:36.46 ceph-mds

Then it crashes. I changed the core dump location out of the root fs,  
the core dump is indeed about 26G

My disks:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       9.9G  2.9G  6.5G  31% /
tmpfs            16G     0   16G   0% /dev/shm
/dev/sda1       248M   53M  183M  23% /boot
/dev/sda4       172G   61G  112G  35% /var/lib/ceph/log/sda4
/dev/sdb        187G   61G  127G  33% /var/lib/ceph/log/sdb
/dev/sdc        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdc
/dev/sdd        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdd
/dev/sde        3.7T  1.4T  2.4T  37% /var/lib/ceph/osd/sde
/dev/sdf        3.7T  1.5T  2.3T  39% /var/lib/ceph/osd/sdf
/dev/sdg        3.7T  2.1T  1.7T  56% /var/lib/ceph/osd/sdg
/dev/sdh        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdh
/dev/sdi        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdi
/dev/sdj        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdj
/dev/sdk        3.7T  2.1T  1.6T  58% /var/lib/ceph/osd/sdk
/dev/sdl        3.7T  1.7T  2.0T  46% /var/lib/ceph/osd/sdl
/dev/sdm        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdm
/dev/sdn        3.7T  1.4T  2.3T  38% /var/lib/ceph/osd/sdn

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

----- End message from Gregory Farnum <greg at inktank.com> -----

-- 

Met vriendelijke groeten,
Kenneth Waegeman