Re: ceph 9.2.0 mds cluster went down and now constantly crashes with Floating point exception

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 5, 2016 at 3:48 PM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
> Hi,
>
> In my attempt to retry, I ran 'ceph mds newfs' because removing the fs was
> not working (because the mdss could not be stopped).
> With the new fs, I could again start syncing. After 10-15min it all crashed
> again. The log now shows some other stacktrace.
>
>
>      -9> 2016-02-05 15:26:29.015197 7f177de2f700 10 mds.0.locker got rdlock
> on (ipolicy sync r=1) [inode 1000000138e [...2,head] /backups/vulpixhome/g
> ent/ auth v1328 ap=1+1 f(v0 m2016-02-05 15:11:49.275650 4=0+4) n(v53
> rc2016-02-05 15:12:41.886233 b276810712 5559=3918+1641) (isnap sync r=1)
> (inest l
> ock dirty) (ipolicy sync r=1) (iversion lock)
> caps={67042=pAsLsXsFs/-@0,67045=pAsLsXs/-@0,67578=pAsLsXsFs/-@0,67587=pAsLsXsFs/-@0}
> | dirtyscattered=1
> lock=2 dirfrag=1 caps=1 dirty=1 authpin=1 0x7f178c539660]
>     -8> 2016-02-05 15:26:29.015219 7f177de2f700  7 mds.0.locker rdlock_start
> on (isnap sync) on [inode 1000000138f [...2,head] /backups/vulpixhome/ge
> nt/vsc408/ auth v1509 ap=1+1 f(v0 m2016-02-05 15:09:13.624725 10=0+10) n(v51
> rc2016-02-05 15:12:41.538927 b168677648 3672=2522+1150)/n(v50 rc2016-02-0
> 5 15:12:41.538927 b168321213 3672=2522+1150) (inest lock dirty) (iversion
> lock) caps={67045=pAsLsXsFs/-@0,67587=pAsLsXsFs/-@0} | dirtyscattered=1 dirf
> rag=1 caps=1 dirtyrstat=1 dirty=1 authpin=1 0x7f178c539bf8]
>     -7> 2016-02-05 15:26:29.015237 7f177de2f700 10 mds.0.locker got rdlock
> on (isnap sync r=1) [inode 1000000138f [...2,head] /backups/vulpixhome/gen
> t/vsc408/ auth v1509 ap=1+1 f(v0 m2016-02-05 15:09:13.624725 10=0+10) n(v51
> rc2016-02-05 15:12:41.538927 b168677648 3672=2522+1150)/n(v50 rc2016-02-05
>  15:12:41.538927 b168321213 3672=2522+1150) (isnap sync r=1) (inest lock
> dirty) (iversion lock) caps={67045=pAsLsXsFs/-@0,67587=pAsLsXsFs/-@0} |
> dirty
> scattered=1 lock=1 dirfrag=1 caps=1 dirtyrstat=1 dirty=1 authpin=1
> 0x7f178c539bf8]
>     -6> 2016-02-05 15:26:29.015256 7f177de2f700  7 mds.0.locker rdlock_start
> on (ipolicy sync) on [inode 1000000138f [...2,head] /backups/vulpixhome/
> gent/vsc408/ auth v1509 ap=1+1 f(v0 m2016-02-05 15:09:13.624725 10=0+10)
> n(v51 rc2016-02-05 15:12:41.538927 b168677648 3672=2522+1150)/n(v50
> rc2016-02
> -05 15:12:41.538927 b168321213 3672=2522+1150) (isnap sync r=1) (inest lock
> dirty) (iversion lock) caps={67045=pAsLsXsFs/-@0,67587=pAsLsXsFs/-@0} | di
> rtyscattered=1 lock=1 dirfrag=1 caps=1 dirtyrstat=1 dirty=1 authpin=1
> 0x7f178c539bf8]
>     -5> 2016-02-05 15:26:29.015274 7f177de2f700 10 mds.0.locker got rdlock
> on (ipolicy sync r=1) [inode 1000000138f [...2,head] /backups/vulpixhome/g
> ent/vsc408/ auth v1509 ap=1+1 f(v0 m2016-02-05 15:09:13.624725 10=0+10)
> n(v51 rc2016-02-05 15:12:41.538927 b168677648 3672=2522+1150)/n(v50
> rc2016-02-
> 05 15:12:41.538927 b168321213 3672=2522+1150) (isnap sync r=1) (inest lock
> dirty) (ipolicy sync r=1) (iversion lock) caps={67045=pAsLsXsFs/-@0,67587=p
> AsLsXsFs/-@0} | dirtyscattered=1 lock=2 dirfrag=1 caps=1 dirtyrstat=1
> dirty=1 authpin=1 0x7f178c539bf8]
>     -4> 2016-02-05 15:26:29.015301 7f177de2f700  5 -- op tracker -- seq: 2,
> time: 2016-02-05 15:26:29.015301, event: acquired locks, op:
> client_request(client.67587:2334 create #10000000d3a/.input.log.us42rN
> 2016-02-05 15:12:42.328932 RETRY=7 REPLAY)
>     -3> 2016-02-05 15:26:29.015321 7f177de2f700 20 mds.0.sessionmap
> mark_projected s=0x7f178c4f3b80 name=client.67587 pv=13127 -> 13128
>     -2> 2016-02-05 15:26:29.015326 7f177de2f700 10 mds.0.server
> prepare_new_inode used_prealloc 1000000380c
> ([1000000380d~a7,10000004a51~1f5], 668 left)
>     -1> 2016-02-05 15:26:29.015330 7f177de2f700 10 mds.0.server dir mode
> 040775 new mode 0100600
>      0> 2016-02-05 15:26:29.018013 7f177de2f700 -1 mds/MDCache.cc: In
> function 'void MDCache::add_inode(CInode*)' thread 7f177de2f700 time
> 2016-02-05 15:26:29.015333
> mds/MDCache.cc: 269: FAILED assert(inode_map.count(in->vino()) == 0)
>
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7f1787ed6105]
>  2: (MDCache::add_inode(CInode*)+0x246) [0x7f1787bd14b6]
>  3: (Server::prepare_new_inode(std::shared_ptr<MDRequestImpl>&, CDir*,
> inodeno_t, unsigned int, ceph_file_layout*)+0xf30) [0x7f1787b71ce0]
>  4: (Server::handle_client_openc(std::shared_ptr<MDRequestImpl>&)+0xd83)
> [0x7f1787b75963]
>  5: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xa48)
> [0x7f1787b94678]
>  6: (MDCache::dispatch_request(std::shared_ptr<MDRequestImpl>&)+0x4c)
> [0x7f1787c0d2dc]
>  7: (MDSInternalContextBase::complete(int)+0x1e3) [0x7f1787d4b993]
>  8: (MDSRank::_advance_queues()+0x382) [0x7f1787b2b652]
>  9: (MDSRank::ProgressThread::entry()+0x4a) [0x7f1787b2baca]
>  10: (()+0x7dc5) [0x7f1786d22dc5]
>  11: (clone()+0x6d) [0x7f1785c0d21d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> I don't know if it is related, or the 'new fs' has broken it..?
> I added the full log  with debug level 20 in an issue ticket:
> http://tracker.ceph.com/issues/14672

You need to delete and recreate your data and metadata pools, it's
probably crashing because it's running into junk from the previous
filesystem.

BTW, you don't have to fall back to newfs when you've got failed
daemons.  You just have to use "ceph mds fail" to erase the daemons
from the MDSMap before running "ceph fs rm".

John


>
>
> On 05/02/16 11:43, John Spray wrote:
>>
>> On Fri, Feb 5, 2016 at 9:36 AM, Kenneth Waegeman
>> <kenneth.waegeman@xxxxxxxx> wrote:
>>>
>>>
>>> On 04/02/16 16:17, Gregory Farnum wrote:
>>>>
>>>> On Thu, Feb 4, 2016 at 1:42 AM, Kenneth Waegeman
>>>> <kenneth.waegeman@xxxxxxxx> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Hi, we are running ceph 9.2.0.
>>>>> Overnight, our ceph state went to 'mds mds03 is laggy' . When I checked
>>>>> the
>>>>> logs, I saw this mds crashed with a stacktrace. I checked the other
>>>>> mdss,
>>>>> and I saw the same there.
>>>>> When I try to start the mds again, I get again a stacktrace and it
>>>>> won't
>>>>> come up:
>>>>>
>>>>>        -12> 2016-02-04 10:23:46.837131 7ff9ea570700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.146 10.141.16.25:6800/7036 1 ====
>>>>> osd_op_reply(207 100005ef982.00000000 [stat] v0'0 uv22184 ondisk = 0)
>>>>> v6
>>>>> ==== 187+0+16 (113
>>>>> 2261152 0 506978568) 0x7ffa171ae940 con 0x7ffa189cc3c0
>>>>>       -11> 2016-02-04 10:23:46.837317 7ff9ed6a1700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.136 10.141.16.24:6800/6764 6 ====
>>>>> osd_op_reply(209 1000048aaac.00000000 [delete] v0'0 uv23797 ondisk = -2
>>>>> ((2)
>>>>> No such file o
>>>>> r directory)) v6 ==== 187+0+0 (64699207 0 0) 0x7ffa171acb00 con
>>>>> 0x7ffa014fd9c0
>>>>>       -10> 2016-02-04 10:23:46.837406 7ff9ec994700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.36 10.141.16.14:6800/5395 5 ====
>>>>> osd_op_reply(175 100005f631f.00000000 [stat] v0'0 uv22466 ondisk = 0)
>>>>> v6
>>>>> ==== 187+0+16 (1037
>>>>> 61047 0 2527067705) 0x7ffa08363700 con 0x7ffa189ca580
>>>>>        -9> 2016-02-04 10:23:46.837463 7ff9eba85700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.47 10.141.16.15:6802/7128 2 ====
>>>>> osd_op_reply(211 1000048aac8.00000000 [delete] v0'0 uv22990 ondisk = -2
>>>>> ((2)
>>>>> No such file or
>>>>>     directory)) v6 ==== 187+0+0 (1138385695 0 0) 0x7ffa01cd0dc0 con
>>>>> 0x7ffa189cadc0
>>>>>        -8> 2016-02-04 10:23:46.837468 7ff9eb27d700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.16 10.141.16.12:6800/5739 2 ====
>>>>> osd_op_reply(212 1000048aacd.00000000 [delete] v0'0 uv23991 ondisk = -2
>>>>> ((2)
>>>>> No such file or
>>>>>     directory)) v6 ==== 187+0+0 (1675093742 0 0) 0x7ffa171ac840 con
>>>>> 0x7ffa189cb760
>>>>>        -7> 2016-02-04 10:23:46.837477 7ff9eab76700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.66 10.141.16.17:6800/6353 2 ====
>>>>> osd_op_reply(210 1000048aab9.00000000 [delete] v0'0 uv24583 ondisk = -2
>>>>> ((2)
>>>>> No such file or
>>>>>     directory)) v6 ==== 187+0+0 (603192739 0 0) 0x7ffa19054680 con
>>>>> 0x7ffa189cbce0
>>>>>        -6> 2016-02-04 10:23:46.838140 7ff9f0bcf700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 43 ====
>>>>> osd_op_reply(121 200.00009d96 [write 1459360~980] v943'4092 uv4092
>>>>> ondisk
>>>>> =
>>>>> 0) v6 ==== 179+0+0 (3939130488 0 0) 0x7ffa01590100 con 0x7ffa014fab00
>>>>>        -5> 2016-02-04 10:23:46.838342 7ff9f0bcf700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 44 ====
>>>>> osd_op_reply(124 200.00009d96 [write 1460340~956] v943'4093 uv4093
>>>>> ondisk
>>>>> =
>>>>> 0) v6 ==== 179+0+0 (1434265886 0 0) 0x7ffa01590100 con 0x7ffa014fab00
>>>>>        -4> 2016-02-04 10:23:46.838531 7ff9f0bcf700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 45 ====
>>>>> osd_op_reply(126 200.00009d96 [write 1461296~954] v943'4094 uv4094
>>>>> ondisk
>>>>> =
>>>>> 0) v6 ==== 179+0+0 (25292940 0 0) 0x7ffa01590100 con 0x7ffa014fab00
>>>>>        -3> 2016-02-04 10:23:46.838700 7ff9ecd98700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.57 10.141.16.16:6802/7067 3 ====
>>>>> osd_op_reply(199 100005ef976.00000000 [stat] v0'0 uv22557 ondisk = 0)
>>>>> v6
>>>>> ==== 187+0+16 (354652996 0 2244692791) 0x7ffa171ade40 con
>>>>> 0x7ffa189ca160
>>>>>        -2> 2016-02-04 10:23:46.839301 7ff9ed8a3700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.107 10.141.16.21:6802/7468 3 ====
>>>>> osd_op_reply(115 10000625476.00000000 [stat] v0'0 uv22587 ondisk = 0)
>>>>> v6
>>>>> ==== 187+0+16 (664308076 0 998461731) 0x7ffa08363c80 con 0x7ffa014fdb20
>>>>>        -1> 2016-02-04 10:23:46.839322 7ff9f0bcf700  1 --
>>>>> 10.141.16.2:6800/193767 <== osd.2 10.141.16.2:6802/126856 46 ====
>>>>> osd_op_reply(128 200.00009d96 [write 1462250~954] v943'4095 uv4095
>>>>> ondisk
>>>>> =
>>>>> 0) v6 ==== 179+0+0 (1379768629 0 0) 0x7ffa01590100 con 0x7ffa014fab00
>>>>>         0> 2016-02-04 10:23:46.839379 7ff9f30d8700 -1 *** Caught signal
>>>>> (Floating point exception) **
>>>>>     in thread 7ff9f30d8700
>>>>>
>>>>>     ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>>>>>     1: (()+0x4b6fa2) [0x7ff9fd091fa2]
>>>>>     2: (()+0xf100) [0x7ff9fbfd3100]
>>>>>     3: (StrayManager::_calculate_ops_required(CInode*, bool)+0xa2)
>>>>> [0x7ff9fcf0adc2]
>>>>>     4: (StrayManager::enqueue(CDentry*, bool)+0x169) [0x7ff9fcf10459]
>>>>>     5: (StrayManager::__eval_stray(CDentry*, bool)+0xa49)
>>>>> [0x7ff9fcf111c9]
>>>>>     6: (StrayManager::eval_stray(CDentry*, bool)+0x1e) [0x7ff9fcf113ce]
>>>>>     7: (MDCache::scan_stray_dir(dirfrag_t)+0x13d) [0x7ff9fce6741d]
>>>>>     8: (MDSInternalContextBase::complete(int)+0x1e3) [0x7ff9fcff4993]
>>>>>     9: (MDSRank::_advance_queues()+0x382) [0x7ff9fcdd4652]
>>>>>     10: (MDSRank::ProgressThread::entry()+0x4a) [0x7ff9fcdd4aca]
>>>>>     11: (()+0x7dc5) [0x7ff9fbfcbdc5]
>>>>>     12: (clone()+0x6d) [0x7ff9faeb621d]
>>>>>
>>>>> Does someone has an idea? We can't use our fs right now..
>>>>
>>>> Hey, fun! Just looking for FPE opportunities in that function, it
>>>> looks like someone managed to set either the object size or stripe
>>>> count to 0 on some of your files. Is that possible?
>>>
>>> I am the only user on the system :) What was happening, was our
>>> production
>>> storage on gpfs being rsynced to the cephfs. so if something happened
>>> like
>>> that, rsync should have triggered it, but that would be very strange in
>>> my
>>> opinion. Can such things be prevented somehow ?
>>>>
>>>> As for repair chances...hmm. I think you'll need to:
>>>> 1) identify what the broken inodes are. Hopefully you can get this by
>>>> setting "debug mds = 20" in your conf, starting up the MDS and seeing
>>>> what the last referenced inode is in your crashing thread...
>>>> 2) Manually remove the entry for that inode in your stray directory.
>>>> The stray directory should be inode 0x601, I think, object
>>>> 601.00000000 in the metadata pool. Use the rados omap commands to
>>>> remove the entries for the bad inode.
>>>> 3) Delete all the objects in the data pool for that file.
>>>> 4) You may also need to remove references to that inode from your
>>>> journal using the cephfs-journal-tool, if it doesn't start up cleanly
>>>> following the prior steps. I think the help text will give you what
>>>> you need; if not John can.
>>>
>>> Our sync was only running for a day, so I can restart from scratch if
>>> this
>>> wouldn't work.. But my main concern is how to avoid this in the future.
>>> Because I suppose if I restart the backup, this can happen again?
>>
>> OK, in a way that's good news because it means this is hopefully
>> something that you can readily reproduce.
>>
>> Before going any further, in case we need to inspect it later, please
>> could you grab a dump of your journal with "cephfs-journal-tool
>> journal export /tmp/somewhere.bin" -- if this proves to be hard to
>> reproduce we might need to go to the trouble of trying to dissect
>> evidence from this particular occurence.
>>
>> The surprising thing is that a file is ending up with a bogus layout
>> to begin with: there are checks that are applied during layout-setting
>> operations that should ensure this situation can't arise.  Especially
>> if you're just rsyncing from another filesystem, nothing should even
>> be trying to touch the ceph-specific layout settings.
>>
>> I guess if this is straightforward to reproduce, could you crank
>> "debug mds" up to 20, run through the whole business again, and then
>> try to snip out all the places in the log where the relevant inode was
>> modified/handled for us?  If your workload includes a very large
>> number of files this might produce an unmanageably large log, but it's
>> worth a go.
>>
>> Thanks,
>> John
>>
>>> Thanks very much!
>>> Kenneth
>>>
>>>> Does that sound right to you, John?
>>>>
>>>> I've created tickets to prevent those nonsensical values from getting
>>>> set, and to prevent the StrayManager from crashing when it encounters
>>>> them. http://tracker.ceph.com/issues/14641,
>>>> http://tracker.ceph.com/issues/14642
>>>> -Greg
>>>>
>>>>> I included the full log of an mds start in attachment
>>>>>
>>>>> Thanks!!
>>>>>
>>>>> K
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux