ceph 0.78 mon and mds crashing (bus error)

greg at inktank.com (Gregory Farnum) · Wed, 2 Apr 2014 07:12:51 -0700



Hmm. I guess I'd look at:
1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?
2) Use tcmalloc's heap analyzer to see where all the memory is being allocated.
3) Look through the logs for when the beacon fails (the first of
"mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and see
if there's anything tell-tale going on at the time.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman
<Kenneth.Waegeman at ugent.be> wrote:
>
> ----- Message from Gregory Farnum <greg at inktank.com> ---------
>    Date: Tue, 1 Apr 2014 09:03:17 -0700
>    From: Gregory Farnum <greg at inktank.com>
>
> Subject: Re: ceph 0.78 mon and mds crashing (bus error)
>      To: "Yan, Zheng" <ukernel at gmail.com>
>      Cc: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>, ceph-users
> <ceph-users at lists.ceph.com>
>
>
>
>> On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <ukernel at gmail.com> wrote:
>>>
>>> On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
>>> <Kenneth.Waegeman at ugent.be> wrote:
>>>>
>>>> After some more searching, I've found that the source of the problem is
>>>> with
>>>> the mds and not the mon.. The mds crashes, generates a core dump that
>>>> eats
>>>> the local space, and in turn the monitor (because of leveldb) crashes.
>>>>
>>>> The error in the mds log of one host:
>>>>
>>>> 2014-04-01 15:46:34.414615 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
>>>> 10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 cs=0 l=1
>>>> c=0x147ac780).connect got RESETSESSION but no longer connecting
>>>> 2014-04-01 15:46:34.438792 7f871194f700  0 -- 10.141.8.180:6836/13152 >>
>>>> 10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0 l=1
>>>> c=0x1fd44360).connect got RESETSESSION but no longer connecting
>>>> 2014-04-01 15:46:34.439028 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
>>>> 10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 cs=0 l=1
>>>> c=0x1fd43de0).connect got RESETSESSION but no longer connecting
>>>> 2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function
>>>> 'void
>>>> CDir::_omap_fetched(ceph::bufferlist&, std::map<std::basic_string<char,
>>>> std::char_traits<char>, std::allocator<char> >, ceph::buffer::list,
>>>> std::less<std::basic_string<char, std::char_traits<char>,
>>>> std::allocator<char> > >, std::allocator<std::pair<const
>>>> std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
>>>> ceph::buffer::list> > >&, const std::string&, int)' thread 7f871b7ae700
>>>> time
>>>> 2014-04-01 15:46:34.448320
>>>> mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)
>>>>
>>>
>>> could you use gdb to check what is value of variable 'r' .
>>
>>
>> If you look at the crash dump log you can see the return value in the
>> osd_op_reply message:
>> -1> 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
>> 10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 ====
>> osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header
>> 0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after
>> transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0)
>> 0x2cff7aa80 con 0x37ea3c0
>>
>> -108, which is ESHUTDOWN, but we also use it (via the 108 constant, I
>> think because ESHUTDOWN varies across platforms) as EBLACKLISTED.
>> So it looks like this is itself actually a symptom of another problem
>> that is causing the MDS to get timed out on the monitor. If a core
>> dump is "eating the local space", maybe the MDS is stuck in an
>> infinite allocation loop of some kind? How big are your disks,
>> Kenneth? Do you have any information on how much CPU/memory the MDS
>> was using before this?
>
>
> I monitored the mds process after restart:
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 19215 root      20   0 6070m 5.7g 5236 S 778.6 18.1   1:27.54 ceph-mds
> 19215 root      20   0 7926m 7.5g 5236 S 179.2 23.8   2:44.39 ceph-mds
> 19215 root      20   0 12.4g  12g 5236 S 157.2 38.8   3:43.47 ceph-mds
> 19215 root      20   0 16.6g  16g 5236 S 144.4 52.0   4:15.01 ceph-mds
> 19215 root      20   0 19.9g  19g 5236 S 137.2 62.5   4:35.83 ceph-mds
> 19215 root      20   0 24.5g  24g 5224 S 136.5 77.0   5:04.66 ceph-mds
> 19215 root      20   0 25.8g  25g 2944 S 33.7 81.2   5:13.74 ceph-mds
> 19215 root      20   0 26.0g  25g 2916 S 24.6 81.7   5:19.07 ceph-mds
> 19215 root      20   0 26.1g  25g 2916 S 13.0 82.1   5:22.16 ceph-mds
> 19215 root      20   0 27.7g  26g 1856 S 100.0 85.8   5:36.46 ceph-mds
>
> Then it crashes. I changed the core dump location out of the root fs, the
> core dump is indeed about 26G
>
> My disks:
>
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda2       9.9G  2.9G  6.5G  31% /
> tmpfs            16G     0   16G   0% /dev/shm
> /dev/sda1       248M   53M  183M  23% /boot
> /dev/sda4       172G   61G  112G  35% /var/lib/ceph/log/sda4
> /dev/sdb        187G   61G  127G  33% /var/lib/ceph/log/sdb
> /dev/sdc        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdc
> /dev/sdd        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdd
> /dev/sde        3.7T  1.4T  2.4T  37% /var/lib/ceph/osd/sde
> /dev/sdf        3.7T  1.5T  2.3T  39% /var/lib/ceph/osd/sdf
> /dev/sdg        3.7T  2.1T  1.7T  56% /var/lib/ceph/osd/sdg
> /dev/sdh        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdh
> /dev/sdi        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdi
> /dev/sdj        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdj
> /dev/sdk        3.7T  2.1T  1.6T  58% /var/lib/ceph/osd/sdk
> /dev/sdl        3.7T  1.7T  2.0T  46% /var/lib/ceph/osd/sdl
> /dev/sdm        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdm
> /dev/sdn        3.7T  1.4T  2.3T  38% /var/lib/ceph/osd/sdn
>
>
>
>
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
> ----- End message from Gregory Farnum <greg at inktank.com> -----
>
>
>
> --
>
> Met vriendelijke groeten,
> Kenneth Waegeman
>