Re: OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic <jan.pekar@xxxxxxxxx> wrote:
Hi all,

I have few problems on my cluster, that are maybe linked together and
now caused OSD down during pg repair.

First few notes about my cluster:

4 nodes, 15 OSDs installed on Luminous (no upgrade).
Replicated pools with 1 pool (pool 6) cached by ssd disks.
I don't detect any hardware failures (disk IO errors, restarts,
corrupted data etc).
I'm running RBDs using libvirt on debian wheezy and jessie (stable and
oldstable).
I'm snapshotting RBD's using Luminous client on Debian Jessie only.

When you say "cached by", do you mean there's a cache pool? Or are you using bcache or something underneath?
 

Now problems, from light to severe:

1)
Almost every day I notice health some problems after deep scrub
1-2 inconsistent PG's with "read_error" on some osd's.
When I don't repair it, it disappears after few days (? another deep
scrub). There is no read_error on disks (disk check ok, no errors logged
in syslog).

2)
I noticed on my pool 6 (cached pool), that scrub reports some objects,
that shouldn't be there:

2018-02-27 23:43:06.490152 7f4b3820e700 -1 osd.1 pg_epoch: 8712 pg[6.20(
v 8712'771984 (8712'770478,8712'771984] local-lis/les=8710/8711 n=14299
ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807 8710/8710/8710)
[1,10,14] r=0 lpr=8710 crt=8712'771984 lcod 8712'771983 mlcod
8712'771983 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps
no head for 6:07ffbc7b:::rbd_data.967992ae8944a.0000000000061cb8:c2
(have MIN)

I think, that means orphaned snap object without his head replica. Maybe
snaptrim left it there? Why? Maybe error during snaptrim? Or
fstrim/discard removed "head" object (this is I hope nonsense)?

3)
I ended with one object (probably snap object), that has only 1 replica
(out from size 3) and when I try to repair it, my OSD crash with

/build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
recovery_info.ss.clone_snaps.end())
I guess, that it detected orphaned snap object I noticed at 2) and don't
repair it, just assterts and stop OSD. Am I right?

I noticed comment "// hmm, should we warn?" on ceph source at that
assert code. So should someone remove that assert?

There's a ticket https://tracker.ceph.com/issues/23030, which links to a much longer discussion on this mailing list between Sage and Stefan which discusses this particular assert. I'm not entirely clear from the rest of your story (and the loooong history in that thread) if there are other potential causes, or if your story might help diagnose it. But I'd start there since AFAIK it's still a mystery that looks serious but has only a very small number of incidences. :/
-Greg
 

And my questions are

How can I fix issue with crashing OSD?
How can I safely remove that objects with missing head? Is there any
tool or force-snaptrim on non-existent snapshots? It is prod cluster so
I want to be careful. I have no problems now with data availability.
My last idea is to move RBD's to another pool, but have not enough space
to do that (as I know RBD can only copy not move) so I'm looking for
another clean solution.
And last question - how can I find, what is causing that read_erros and
snap object leftovers?

Should I paste my whole log? It is bigger than allowed post size.
Pasting most important events:

    -23> 2018-02-27 23:43:07.903368 7f4b3820e700  2 osd.1 pg_epoch: 8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+clean+scrubbing+deep+inconsistent+repair] 6.20
repair 1 missing, 0 inconsistent objects
    -22> 2018-02-27 23:43:07.903410 7f4b3820e700 -1 log_channel(cluster)
log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects
    -21> 2018-02-27 23:43:07.903446 7f4b3820e700 -1 log_channel(cluster)
log [ERR] : 6.20 repair 3 errors, 2 fixed
    -20> 2018-02-27 23:43:07.903480 7f4b3820e700  5
write_log_and_missing with: dirty_to: 0'0, dirty_from:
4294967295'18446744073709551615, writeout_from:
4294967295'18446744073709551615, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
    -19> 2018-02-27 23:43:07.903604 7f4b3820e700  1 --
[2a01:430:22a::cef:c011]:6805/514544 -->
[2a01:430:22a::cef:c021]:6803/3001666 -- MOSDScrubReserve(6.20 RELEASE
e8712) v1 -- 0x55a4c5459c00 con 0
    -18> 2018-02-27 23:43:07.903651 7f4b3820e700  1 --
[2a01:430:22a::cef:c011]:6805/514544 -->
[2a01:430:22a::cef:c041]:6802/3012729 -- MOSDScrubReserve(6.20 RELEASE
e8712) v1 -- 0x55a4cb6dee00 con 0
    -17> 2018-02-27 23:43:07.903679 7f4b3820e700  1 --
[2a01:430:22a::cef:c011]:6805/514544 -->
[2a01:430:22a::cef:c021]:6803/3001666 -- pg_info((query:8712 sent:8712
6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cba734a0 con 0
    -16> 2018-02-27 23:43:07.903704 7f4b3820e700  1 --
[2a01:430:22a::cef:c011]:6805/514544 -->
[2a01:430:22a::cef:c041]:6802/3012729 -- pg_info((query:8712 sent:8712
6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710))=(empty) epoch 8712) v5 -- 0x55a4cccb5680 con 0
    -15> 2018-02-27 23:43:07.903746 7f4b39a11700  5 osd.1 pg_epoch: 8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+degraded+inconsistent m=1] exit
Started/Primary/Active/Clean 7633.321426 4 0.000185
    -14> 2018-02-27 23:43:07.903776 7f4b39a11700  5 osd.1 pg_epoch: 8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+degraded+inconsistent m=1] enter
Started/Primary/Active/WaitLocalRecoveryReserved
    -13> 2018-02-27 23:43:07.903953 7f4b3a212700  5 osd.1 pg_epoch: 8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] exit
Started/Primary/Active/WaitLocalRecoveryReserved 0.000177 1 0.000072
    -12> 2018-02-27 23:43:07.904009 7f4b3a212700  5 osd.1 pg_epoch: 8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] enter
Started/Primary/Active/WaitRemoteRecoveryReserved
    -11> 2018-02-27 23:43:07.904038 7f4b3a212700  1 --
[2a01:430:22a::cef:c011]:6805/514544 -->
[2a01:430:22a::cef:c021]:6803/3001666 -- MRecoveryReserve(6.20 REQUEST
e8712) v2 -- 0x55a4c5082780 con 0
    -10> 2018-02-27 23:43:07.983199 7f4b4e256700  5 --
[2a01:430:22a::cef:c011]:6805/514544 >>
[2a01:430:22a::cef:c021]:6803/3001666 conn(0x55a4c4f87800 :-1
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2215 cs=1 l=0). rx
osd.10 seq 6199 0x55a4c5082780 MRecoveryReserve(6.20 GRANT e8712) v2
     -9> 2018-02-27 23:43:07.983275 7f4b43ac2700  1 --
[2a01:430:22a::cef:c011]:6805/514544 <== osd.10
[2a01:430:22a::cef:c021]:6803/3001666 6199 ==== MRecoveryReserve(6.20
GRANT e8712) v2 ==== 26+0+0 (3843789212 0 0) 0x55a4c5082780 con
0x55a4c4f87800
     -8> 2018-02-27 23:43:07.983411 7f4b39a11700  1 --
[2a01:430:22a::cef:c011]:6805/514544 -->
[2a01:430:22a::cef:c041]:6802/3012729 -- MRecoveryReserve(6.20 REQUEST
e8712) v2 -- 0x55a4c5040780 con 0
     -7> 2018-02-27 23:43:07.984468 7f4b4da55700  5 --
[2a01:430:22a::cef:c011]:6805/514544 >>
[2a01:430:22a::cef:c041]:6802/3012729 conn(0x55a4c4f89000 :-1
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1197 cs=1 l=0). rx
osd.14 seq 6198 0x55a4c5040780 MRecoveryReserve(6.20 GRANT e8712) v2
     -6> 2018-02-27 23:43:07.984527 7f4b43ac2700  1 --
[2a01:430:22a::cef:c011]:6805/514544 <== osd.14
[2a01:430:22a::cef:c041]:6802/3012729 6198 ==== MRecoveryReserve(6.20
GRANT e8712) v2 ==== 26+0+0 (3843789212 0 0) 0x55a4c5040780 con
0x55a4c4f89000
     -5> 2018-02-27 23:43:07.984642 7f4b3a212700  5 osd.1 pg_epoch: 8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] exit
Started/Primary/Active/WaitRemoteRecoveryReserved 0.080633 2 0.000176
     -4> 2018-02-27 23:43:07.984705 7f4b3a212700  5 osd.1 pg_epoch: 8712
pg[6.20( v 8712'771986 (8712'770478,8712'771986] local-lis/les=8710/8711
n=14299 ec=4197/2380 lis/c 8710/8710 les/c/f 8711/8711/2807
8710/8710/8710) [1,10,14] r=0 lpr=8710 crt=8712'771986 lcod 8712'771985
mlcod 8712'771985 active+recovery_wait+degraded+inconsistent m=1] enter
Started/Primary/Active/Recovering
     -3> 2018-02-27 23:43:08.087536 7f4b3820e700  1 --
[2a01:430:22a::cef:c011]:6805/514544 -->
[2a01:430:22a::cef:c041]:6802/3012729 -- MOSDPGPull(6.20 e8712/8710 cost
8389608) v3 -- 0x55a4c7281600 con 0
     -2> 2018-02-27 23:43:08.281205 7f4b4da55700  5 --
[2a01:430:22a::cef:c011]:6805/514544 >>
[2a01:430:22a::cef:c041]:6802/3012729 conn(0x55a4c4f89000 :-1
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=1197 cs=1 l=0). rx
osd.14 seq 6199 0x55a4c7281600 MOSDPGPush(6.20 8712/8710
[PushOp(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25,
version: 6240'452090, data_included: [], data_size: 0, omap_header_size:
0, omap_entries_size: 0, attrset_size: 1, recovery_info:
ObjectRecoveryInfo(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25@6240'452090,
size: 0, copy_subset: [], clone_subset:
{6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:d8=[]}, snapset:
d8=[d8,d6,d4,d2,d0,ce,cb,ca,c8,c6,c4,c2,c0,be]:{be=[be],c0=[c0],c2=[c2],c6=[c6,c4],c8=[c8],ca=[ca],cb=[cb],ce=[ce],d0=[d0],d2=[d2],d4=[d4],d8=[d8,d6]}),
after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0,
data_complete:true, omap_recovered_to:, omap_complete:true,
error:false), before_progress: ObjectRecoveryProgress(first,
data_recovered_to:0, data_complete:false, omap_recovered_to:,
omap_complete:false, error:false))]) v3
     -1> 2018-02-27 23:43:08.281240 7f4b4da55700  1 --
[2a01:430:22a::cef:c011]:6805/514544 <== osd.14
[2a01:430:22a::cef:c041]:6802/3012729 6199 ==== MOSDPGPush(6.20
8712/8710
[PushOp(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25,
version: 6240'452090, data_included: [], data_size: 0, omap_header_size:
0, omap_entries_size: 0, attrset_size: 1, recovery_info:
ObjectRecoveryInfo(6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25@6240'452090,
size: 0, copy_subset: [], clone_subset:
{6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:d8=[]}, snapset:
d8=[d8,d6,d4,d2,d0,ce,cb,ca,c8,c6,c4,c2,c0,be]:{be=[be],c0=[c0],c2=[c2],c6=[c6,c4],c8=[c8],ca=[ca],cb=[cb],ce=[ce],d0=[d0],d2=[d2],d4=[d4],d8=[d8,d6]}),
after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0,
data_complete:true, omap_recovered_to:, omap_complete:true,
error:false), before_progress: ObjectRecoveryProgress(first,
data_recovered_to:0, data_complete:false, omap_recovered_to:,
omap_complete:false, error:false))]) v3 ==== 2020+0+0 (3173406802 0 0)
0x55a4c7281600 con 0x55a4c4f89000
      0> 2018-02-27 23:43:08.287138 7f4b3820e700 -1
/build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: In function 'virtual void
PrimaryLogPG::on_local_recover(const hobject_t&, const
ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)'
thread 7f4b3820e700 time 2018-02-27 23:43:08.281422
/build/ceph-12.2.3/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
recovery_info.ss.clone_snaps.end())

  ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949)
luminous (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x55a4b78ad9b2]
  2: (PrimaryLogPG::on_local_recover(hobject_t const&,
ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool,
ObjectStore::Transaction*)+0x11f0) [0x55a4b74581b0]
  3: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&,
PullOp*, std::list<ReplicatedBackend::pull_complete_info,
std::allocator<ReplicatedBackend::pull_complete_info> >*,
ObjectStore::Transaction*)+0x788) [0x55a4b75d5ff8]
  4:
(ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x2a6)
[0x55a4b75d7fc6]
  5:
(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x224)
[0x55a4b75db204]
  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
[0x55a4b7500690]
  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x5f1) [0x55a4b746bf31]
  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7)
[0x55a4b72f42d7]
  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0x57) [0x55a4b7571137]
  10: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x108c) [0x55a4b73245ec]
  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d)
[0x55a4b78b363d]
  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55a4b78b5600]
  13: (()+0x7494) [0x7f4b515f6494]
  14: (clone()+0x3f) [0x7f4b5067daff]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Thank you very much

With regards
Jan Pekar

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux