Re: OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6.3.2018 22:28, Gregory Farnum wrote:
On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic <jan.pekar@xxxxxxxxx <mailto:jan.pekar@xxxxxxxxx>> wrote:

    Hi all,

    I have few problems on my cluster, that are maybe linked together and
    now caused OSD down during pg repair.

    First few notes about my cluster:

    4 nodes, 15 OSDs installed on Luminous (no upgrade).
    Replicated pools with 1 pool (pool 6) cached by ssd disks.
    I don't detect any hardware failures (disk IO errors, restarts,
    corrupted data etc).
    I'm running RBDs using libvirt on debian wheezy and jessie (stable and
    oldstable).
    I'm snapshotting RBD's using Luminous client on Debian Jessie only.


When you say "cached by", do you mean there's a cache pool? Or are you using bcache or something underneath?

I mean cache pool.



    Now problems, from light to severe:

    1)
    Almost every day I notice health some problems after deep scrub
    1-2 inconsistent PG's with "read_error" on some osd's.
    When I don't repair it, it disappears after few days (? another deep
    scrub). There is no read_error on disks (disk check ok, no errors logged
    in syslog).


    2)
    I noticed on my pool 6 (cached pool), that scrub reports some objects,
    that shouldn't be there:

    2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20(
    v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299
    ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710)
    [1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod
    8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps
    no head for 6:07ffbc7b:::rbd_data.967992ae8944a.0000000000061cb8:c2
    (have MIN)

    I think, that means orphaned snap object without his head replica. Maybe
    snaptrim left it there? Why? Maybe error during snaptrim? Or
    fstrim/discard removed "head" object (this is I hope nonsense)?

    3)
    I ended with one object (probably snap object), that has only 1 replica
    (out from size 3) and when I try to repair it, my OSD crash with

    /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
    recovery_info.ss.clone_snaps.end())
    I guess, that it detected orphaned snap object I noticed at 2) and don't
    repair it, just assterts and stop OSD. Am I right?

    I noticed comment "// hmm, should we warn?" on ceph source at that
    assert code. So should someone remove that assert?


There's a ticket https://tracker.ceph.com/issues/23030, which links to a much longer discussion on this mailing list between Sage and Stefan which discusses this particular assert. I'm not entirely clear from the rest of your story (and the loooong history in that thread) if there are other potential causes, or if your story might help diagnose it. But I'd start there since AFAIK it's still a mystery that looks serious but has only a very small number of incidences. :/
-Greg

Thank you, I will go through it, but it looks not to be related to my issue.
Now I added new disk to cluster, upgraded to 12.2.4. on some nodes and so far no scrub errors. If there is no clear answer to my OSD crash, I will try to wipe OSD containing problematic object (that causes my primary OSD fail), and rebuild it from other copies. Hope without that object, crash will not appear. If you think, that osdmaptool is safe for fuse-mount and delete object by hand, I can try it. But I'm looking for some tool to do that "online" on all PG copies and with checks (that snap object is not referenced somewhere).

With regards
Jan Pekar



    And my questions are

    How can I fix issue with crashing OSD?
    How can I safely remove that objects with missing head? Is there any
    tool or force-snaptrim on non-existent snapshots? It is prod cluster so
    I want to be careful. I have no problems now with data availability.
    My last idea is to move RBD's to another pool, but have not enough space
    to do that (as I know RBD can only copy not move) so I'm looking for
    another clean solution.
    And last question - how can I find, what is causing that read_erros and
    snap object leftovers?

    Should I paste my whole log? It is bigger than allowed post size.
    Pasting most important events:

         -23> 2018-02-27 23:43:07.903368 7f4b3820e700  2 osd.1 pg_epoch:
    8712
    pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
    mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20
    repair 1 missing, 0 inconsistent objects
         -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1
    log_channel(cluster)
    log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects
         -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1
    log_channel(cluster)
    log [ERR] : 6.20 repair 3 errors, 2 fixed
         -20> 2018-02-27 23:43:07.903480 7f4b3820e700  5
    write_log_and_missing with: dirty_to: 0'0, dirty_from:
    4294967295'18446744073709551615, writeout_from:
    4294967295'18446744073709551615, trimmed: , trimmed_dups: ,
    clear_divergent_priors: 0
         -19> 2018-02-27 23:43:07.903604 7f4b3820e700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 -->
    [2a01:430:22a::cef:c021]:6803/3001666 -- MOSDScrubReserve(6.20 RELEASE
    e8712) v1 -- 0x55a4c5459c00 con 0
         -18> 2018-02-27 23:43:07.903651 7f4b3820e700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 -->
    [2a01:430:22a::cef:c041]:6802/3012729 -- MOSDScrubReserve(6.20 RELEASE
    e8712) v1 -- 0x55a4cb6dee00 con 0
         -17> 2018-02-27 23:43:07.903679 7f4b3820e700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 -->
    [2a01:430:22a::cef:c021]:6803/3001666 -- pg_info((query:8712 sent:8712
    6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cba734a0 con 0
         -16> 2018-02-27 23:43:07.903704 7f4b3820e700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 -->
    [2a01:430:22a::cef:c041]:6802/3012729 -- pg_info((query:8712 sent:8712
    6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cccb5680 con 0
         -15> 2018-02-27 23:43:07.903746 7f4b39a11700  5 osd.1 pg_epoch:
    8712
    pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
    mlcod 8712'771985 active+degraded+inconsistent m=1] exit
    Started/Primary/Active/Clean 7633.321426 4 0.000185
         -14> 2018-02-27 23:43:07.903776 7f4b39a11700  5 osd.1 pg_epoch:
    8712
    pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
    mlcod 8712'771985 active+degraded+inconsistent m=1] enter
    Started/Primary/Active/WaitLocalRecoveryReserved
         -13> 2018-02-27 23:43:07.903953 7f4b3a212700  5 osd.1 pg_epoch:
    8712
    pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
    mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] exit
    Started/Primary/Active/WaitLocalRecoveryReserved 0.000177 1 0.000072
         -12> 2018-02-27 23:43:07.904009 7f4b3a212700  5 osd.1 pg_epoch:
    8712
    pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
    mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] enter
    Started/Primary/Active/WaitRemoteRecoveryReserved
         -11> 2018-02-27 23:43:07.904038 7f4b3a212700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 -->
    [2a01:430:22a::cef:c021]:6803/3001666 -- MRecoveryReserve(6.20 REQUEST
    e8712) v2 -- 0x55a4c5082780 con 0
         -10> 2018-02-27 23:43:07.983199 7f4b4e256700  5 --
    [2a01:430:22a::cef:c011]:6805/514544 >>
    [2a01:430:22a::cef:c021]:6803/3001666 conn(0x55a4c4f87800 :-1
    s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2215 cs=1 l=0). rx
    osd.10 seq 6199 0x55a4c5082780 MRecoveryReserve(6.20 GRANT e8712) v2
          -9> 2018-02-27 23:43:07.983275 7f4b43ac2700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 <== osd.10
    [2a01:430:22a::cef:c021]:6803/3001666 6199 ==== MRecoveryReserve(6.20
    GRANT e8712) v2 ==== 26+0+0 (3843789212 0 0) 0x55a4c5082780 con
    0x55a4c4f87800
          -8> 2018-02-27 23:43:07.983411 7f4b39a11700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 -->
    [2a01:430:22a::cef:c041]:6802/3012729 -- MRecoveryReserve(6.20 REQUEST
    e8712) v2 -- 0x55a4c5040780 con 0
          -7> 2018-02-27 23:43:07.984468 7f4b4da55700  5 --
    [2a01:430:22a::cef:c011]:6805/514544 >>
    [2a01:430:22a::cef:c041]:6802/3012729 conn(0x55a4c4f89000 :-1
    s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1197 cs=1 l=0). rx
    osd.14 seq 6198 0x55a4c5040780 MRecoveryReserve(6.20 GRANT e8712) v2
          -6> 2018-02-27 23:43:07.984527 7f4b43ac2700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 <== osd.14
    [2a01:430:22a::cef:c041]:6802/3012729 6198 ==== MRecoveryReserve(6.20
    GRANT e8712) v2 ==== 26+0+0 (3843789212 0 0) 0x55a4c5040780 con
    0x55a4c4f89000
          -5> 2018-02-27 23:43:07.984642 7f4b3a212700  5 osd.1 pg_epoch:
    8712
    pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
    mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] exit
    Started/Primary/Active/WaitRemoteRecoveryReserved 0.080633 2 0.000176
          -4> 2018-02-27 23:43:07.984705 7f4b3a212700  5 osd.1 pg_epoch:
    8712
    pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
    n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
    8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
    mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] enter
    Started/Primary/Active/Recovering
          -3> 2018-02-27 23:43:08.087536 7f4b3820e700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 -->
    [2a01:430:22a::cef:c041]:6802/3012729 -- MOSDPGPull(6.20 e8712/8710 cost
    8389608) v3 -- 0x55a4c7281600 con 0
          -2> 2018-02-27 23:43:08.281205 7f4b4da55700  5 --
    [2a01:430:22a::cef:c011]:6805/514544 >>
    [2a01:430:22a::cef:c041]:6802/3012729 conn(0x55a4c4f89000 :-1
    s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1197 cs=1 l=0). rx
    osd.14 seq 6199 0x55a4c7281600 MOSDPGPush(6.20 8712/8710
    [PushOp(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25,
    version: 6240'452090, data_included: [], data_size: 0, omap_header_size:
    0, omap_entries_size: 0, attrset_size: 1, recovery_info:
    ObjectRecoveryInfo(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25@6240'452090,
    size: 0, copy_subset: [], clone_subset:
    {6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:d8=[]}, snapset:
    d8=[d8,d6,d4,d2,d0,ce,cb,ca,c8,c6,c4,c2,c0,be]:{be=[be],c0=[c0],c2=[c2],c6=[c6,c4],c8=[c8],ca=[ca],cb=[cb],ce=[ce],d0=[d0],d2=[d2],d4=[d4],d8=[d8,d6]}),
    after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0,
    data_complete:true, omap_recovered_to:, omap_complete:true,
    error:false), before_progress: ObjectRecoveryProgress(first,
    data_recovered_to:0, data_complete:false, omap_recovered_to:,
    omap_complete:false, error:false))]) v3
          -1> 2018-02-27 23:43:08.281240 7f4b4da55700  1 --
    [2a01:430:22a::cef:c011]:6805/514544 <== osd.14
    [2a01:430:22a::cef:c041]:6802/3012729 6199 ==== MOSDPGPush(6.20
    8712/8710
    [PushOp(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25,
    version: 6240'452090, data_included: [], data_size: 0, omap_header_size:
    0, omap_entries_size: 0, attrset_size: 1, recovery_info:
    ObjectRecoveryInfo(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25@6240'452090,
    size: 0, copy_subset: [], clone_subset:
    {6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:d8=[]}, snapset:
    d8=[d8,d6,d4,d2,d0,ce,cb,ca,c8,c6,c4,c2,c0,be]:{be=[be],c0=[c0],c2=[c2],c6=[c6,c4],c8=[c8],ca=[ca],cb=[cb],ce=[ce],d0=[d0],d2=[d2],d4=[d4],d8=[d8,d6]}),
    after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0,
    data_complete:true, omap_recovered_to:, omap_complete:true,
    error:false), before_progress: ObjectRecoveryProgress(first,
    data_recovered_to:0, data_complete:false, omap_recovered_to:,
    omap_complete:false, error:false))]) v3 ==== 2020+0+0 (3173406802
    <tel:(317)%20340-6802> 0 0)
    0x55a4c7281600 con 0x55a4c4f89000
           0> 2018-02-27 23:43:08.287138 7f4b3820e700 -1
    /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: In function 'virtual void
    PrimaryLogPG::on_local_recover(const hobject_t&, const
    ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)'
    thread 7f4b3820e700 time 2018-02-27 23:43:08.281422
    /build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
    recovery_info.ss.clone_snaps.end())

       ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949)
    luminous (stable)
       1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
    const*)+0x102) [0x55a4b78ad9b2]
       2: (PrimaryLogPG::on_local_recover(hobject_t const&,
    ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool,
    ObjectStore::Transaction*)+0x11f0) [0x55a4b74581b0]
       3: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp
    const&,
    PullOp*, std::list<ReplicatedBackend::pull_complete_info,
    std::allocator<ReplicatedBackend::pull_complete_info> >*,
    ObjectStore::Transaction*)+0x788) [0x55a4b75d5ff8]
       4:
    (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x2a6)
    [0x55a4b75d7fc6]
       5:
    (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x224)
    [0x55a4b75db204]
       6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
    [0x55a4b7500690]
       7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
    ThreadPool::TPHandle&)+0x5f1) [0x55a4b746bf31]
       8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
    boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
    [0x55a4b72f42d7]
       9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
    const&)+0x57) [0x55a4b7571137]
       10: (OSD::ShardedOpWQ::_process(unsigned int,
    ceph::heartbeat_handle_d*)+0x108c) [0x55a4b73245ec]
       11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
    [0x55a4b78b363d]
       12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
    [0x55a4b78b5600]
       13: (()+0x7494) [0x7f4b515f6494]
       14: (clone()+0x3f) [0x7f4b5067daff]
       NOTE: a copy of the executable, or `objdump -rdS <executable>` is
    needed to interpret this.

    Thank you very much

    With regards
    Jan Pekar

    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux