Re: PG deep-scrub does not finish

Brad Hubbard <bhubbard@xxxxxxxxxx> · Wed, 22 Apr 2020 07:31:07 +1000

Looks like that drive is dying.

On Wed, Apr 22, 2020 at 12:25 AM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Brad,
>
> Indeed - osd.694 kept crashing with a read error (medium error on the
> hard drive), and got restarted by systemd.  So net net the system ended
> up in an infinite loop of deep scrub attempts on the PG for a week.
> Typically when a scrub encounters a read error, I get an inconsistent
> placement group, not an OSD crash and an infinite loop of scrub
> attempts.  With the inconsistent placement group a pg repair fixes the
> read error (by reallocating the sector inside the drive).
>
> Here is the stack trace of the osd.694 crash on the scrub read error:
>
> Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 FAILED
> Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 Sense
> Key : Medium Error [current] [descriptor]
> Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 Add.
> Sense: Unrecovered read error
> Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 CDB:
> Read(10) 28 00 6e a9 7a 30 00 00 80 00
> Apr 19 03:39:17 popeye-oss-3-03 kernel: print_req_error: critical medium
> error, dev sdz, sector 14852804992
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 2020-04-19 03:39:17.095
> 7fffd2e2b700 -1 bluestore(/var/lib/ceph/osd/ceph-694) _do_read bdev-read
> failed: (61) No data available
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/os/bluestore/BlueStore.cc:
> In function 'int BlueStore::_do_read(BlueStore::Collection*,
> BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t,
> uint64_t)' thread 7fffd2e2b700 time 2020-04-19 03:39:17.099677
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/os/bluestore/BlueStore.cc:
> 9214: FAILED ceph_assert(r == 0)
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: ceph version 14.2.8
> (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 1:
> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14a) [0x555555a1ea4d]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 2: (()+0x4cac15) [0x555555a1ec15]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 3:
> (BlueStore::_do_read(BlueStore::Collection*,
> boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long,
> ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x3512)
> [0x555555f64132]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 4:
> (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> ghobject_t const&, unsigned long, unsigned long,
> ceph::buffer::v14_2_0::list&, unsigned int)+0x1b8) [0x555555f64778]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 5:
> (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&,
> ScrubMapBuilder&, ScrubMap::object&)+0x2c2) [0x555555deb832]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 6:
> (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x663)
> [0x555555d082c3]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 7:
> (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t,
> hobject_t, bool, ThreadPool::TPHandle&)+0x8b) [0x555555bbaacb]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 8:
> (PG::chunky_scrub(ThreadPool::TPHandle&)+0x181c) [0x555555be4fcc]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 9: (PG::scrub(unsigned int,
> ThreadPool::TPHandle&)+0x4bb) [0x555555be61db]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 10: (PGScrub::run(OSD*,
> OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x12)
> [0x555555d8c7b2]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 11:
> (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x90f) [0x555555b1898f]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 12:
> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6)
> [0x5555560bd056]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 13:
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5555560bfb70]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 14: (()+0x7e65) [0x7ffff5025e65]
> Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 15: (clone()+0x6d)
> [0x7ffff3ee988d]
>
> I ended up recreating the OSD (and thus overwriting all data) to fix the
> issue.
>
> Andras
>
>
> On 4/20/20 9:28 PM, Brad Hubbard wrote:
> > On Mon, Apr 20, 2020 at 11:01 PM Andras Pataki
> > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >> On a cluster running Nautilus (14.2.8), we are getting a complaint about
> >> a PG not being deep-scrubbed on time.  Looking at the primary OSD's
> >> logs, it looks like it tries to deep-scrub the PG every hour or so,
> >> emits some complaints that I don't understand, but the deep scrub does
> >> not finish (either with or without a scrub error).
> >>
> >> Here is the PG from pg dump:
> >>
> >> 1.43f     31794                  0        0         0       0
> >> 66930087214           0          0 3004     3004
> >> active+clean+scrubbing+deep 2020-04-20 04:48:13.055481 46286'483734
> >> 46286:563439 [354,694,851]        354 [354,694,851]            354
> >> 39594'348643 2020-04-10 12:39:16.261088    38482'314644 2020-04-04
> >> 10:37:03.121638             0
> >>
> >> Here is a section of the primary OSD logs (osd.354):
> >>
> >> 2020-04-18 08:21:08.322 7fffd2e2b700  0 log_channel(cluster) log [DBG] :
> >> 1.43f deep-scrub starts
> > Scrubbing starts.
> >
> >> 2020-04-18 08:37:53.362 7fffd2e2b700  1 osd.354 pg_epoch: 45910
> >> pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
> >> local-lis/les=45908/45909 n=30862 ec=874/871 lis/c 45908/45908 les/c/f
> >> 45909/45909/0 45910/45910/42988) [354,851] r=0 lpr=45910
> >> pi=[45908,45910)/1 luod=0'0 crt=45909'449615 lcod 45909'449614 mlcod 0'0
> >> active+scrubbing+deep mbc={}] start_peering_interval up [354,694,851] ->
> >> [354,851], acting [354,694,851] -> [354,851], acting_primary 354 -> 354,
> >> up_primary 354 -> 354, role 0 -> 0, features acting 4611087854031667199
> >> upacting 4611087854031667199
> >> 2020-04-18 08:37:53.362 7fffd2e2b700  1 osd.354 pg_epoch: 45910
> >> pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
> >> local-lis/les=45908/45909 n=30862 ec=874/871 lis/c 45908/45908 les/c/f
> >> 45909/45909/0 45910/45910/42988) [354,851] r=0 lpr=45910
> >> pi=[45908,45910)/1 crt=45909'449615 lcod 45909'449614 mlcod 0'0 unknown
> >> mbc={}] state<Start>: transitioning to Primary
> >> 2020-04-18 08:38:01.002 7fffd2e2b700  1 osd.354 pg_epoch: 45912
> >> pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
> >> local-lis/les=45910/45911 n=30862 ec=874/871 lis/c 45910/45908 les/c/f
> >> 45911/45909/0 45912/45912/42988) [354,694,851] r=0 lpr=45912
> >> pi=[45908,45912)/1 luod=0'0 crt=45909'449615 lcod 45909'449614 mlcod 0'0
> >> active mbc={}] start_peering_interval up [354,851] -> [354,694,851],
> >> acting [354,851] -> [354,694,851], acting_primary 354 -> 354, up_primary
> >> 354 -> 354, role 0 -> 0, features acting 4611087854031667199 upacting
> >> 4611087854031667199
> > Epoch (and therefore map) change from 45910 to 45912 You lost osd 694
> > from the acting set so peering has to restart and the scrub is
> > aborted.
> >
> >> 2020-04-18 08:38:01.002 7fffd2e2b700  1 osd.354 pg_epoch: 45912
> >> pg[1.43f( v 45909'449615 (45801'446525,45909'449615]
> >> local-lis/les=45910/45911 n=30862 ec=874/871 lis/c 45910/45908 les/c/f
> >> 45911/45909/0 45912/45912/42988) [354,694,851] r=0 lpr=45912
> >> pi=[45908,45912)/1 crt=45909'449615 lcod 45909'449614 mlcod 0'0 unknown
> >> mbc={}] state<Start>: transitioning to Primary
> >> 2020-04-18 08:40:04.219 7fffd2e2b700  0 log_channel(cluster) log [DBG] :
> >> 1.43f deep-scrub starts
> > Scrubbing start again.
> >
> >> 2020-04-18 08:56:49.095 7fffd2e2b700  1 osd.354 pg_epoch: 45914
> >> pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
> >> local-lis/les=45912/45913 n=30868 ec=874/871 lis/c 45912/45912 les/c/f
> >> 45913/45913/0 45914/45914/42988) [354,851] r=0 lpr=45914
> >> pi=[45912,45914)/1 luod=0'0 crt=45913'449735 lcod 45913'449734 mlcod 0'0
> >> active+scrubbing+deep mbc={}] start_peering_interval up [354,694,851] ->
> >> [354,851], acting [354,694,851] -> [354,851], acting_primary 354 -> 354,
> >> up_primary 354 -> 354, role 0 -> 0, features acting 4611087854031667199
> >> upacting 4611087854031667199
> >> 2020-04-18 08:56:49.095 7fffd2e2b700  1 osd.354 pg_epoch: 45914
> >> pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
> >> local-lis/les=45912/45913 n=30868 ec=874/871 lis/c 45912/45912 les/c/f
> >> 45913/45913/0 45914/45914/42988) [354,851] r=0 lpr=45914
> >> pi=[45912,45914)/1 crt=45913'449735 lcod 45913'449734 mlcod 0'0 unknown
> >> mbc={}] state<Start>: transitioning to Primary
> >> 2020-04-18 08:56:55.627 7fffd2e2b700  1 osd.354 pg_epoch: 45916
> >> pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
> >> local-lis/les=45914/45915 n=30868 ec=874/871 lis/c 45914/45912 les/c/f
> >> 45915/45913/0 45916/45916/42988) [354,694,851] r=0 lpr=45916
> >> pi=[45912,45916)/1 luod=0'0 crt=45913'449735 lcod 45913'449734 mlcod 0'0
> >> active mbc={}] start_peering_interval up [354,851] -> [354,694,851],
> >> acting [354,851] -> [354,694,851], acting_primary 354 -> 354, up_primary
> >> 354 -> 354, role 0 -> 0, features acting 4611087854031667199 upacting
> >> 4611087854031667199
> > Same again, you lost osd 694 from the acting set, epoch and map change
> > requiring repeering and aborting the scrub.
> >
> > You need to identify why osd 694 is flapping, or at least appears to
> > be flapping to the monitor.
> >
> > Start by having a careful look at the logs for osd 694. If they
> > provide no insight look at the logs on the active monitor for any
> > reference to osd 694 around the time the epoch changes.
> >
> >> 2020-04-18 08:56:55.627 7fffd2e2b700  1 osd.354 pg_epoch: 45916
> >> pg[1.43f( v 45913'449735 (45812'446725,45913'449735]
> >> local-lis/les=45914/45915 n=30868 ec=874/871 lis/c 45914/45912 les/c/f
> >> 45915/45913/0 45916/45916/42988) [354,694,851] r=0 lpr=45916
> >> pi=[45912,45916)/1 crt=45913'449735 lcod 45913'449734 mlcod 0'0 unknown
> >> mbc={}] state<Start>: transitioning to Primary
> >> 2020-04-18 08:56:56.867 7fffd2e2b700  0 log_channel(cluster) log [DBG] :
> >> 1.43f deep-scrub starts
> >> 2020-04-18 09:13:37.419 7fffd2e2b700  1 osd.354 pg_epoch: 45918
> >> pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
> >> local-lis/les=45916/45917 n=30876 ec=874/871 lis/c 45916/45916 les/c/f
> >> 45917/45917/0 45918/45918/42988) [354,851] r=0 lpr=45918
> >> pi=[45916,45918)/1 luod=0'0 crt=45917'449808 lcod 45917'449807 mlcod 0'0
> >> active+scrubbing+deep mbc={}] start_peering_interval up [354,694,851] ->
> >> [354,851], acting [354,694,851] -> [354,851], acting_primary 354 -> 354,
> >> up_primary 354 -> 354, role 0 -> 0, features acting 4611087854031667199
> >> upacting 4611087854031667199
> >> 2020-04-18 09:13:37.419 7fffd2e2b700  1 osd.354 pg_epoch: 45918
> >> pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
> >> local-lis/les=45916/45917 n=30876 ec=874/871 lis/c 45916/45916 les/c/f
> >> 45917/45917/0 45918/45918/42988) [354,851] r=0 lpr=45918
> >> pi=[45916,45918)/1 crt=45917'449808 lcod 45917'449807 mlcod 0'0 unknown
> >> mbc={}] state<Start>: transitioning to Primary
> >> 2020-04-18 09:13:44.959 7fffd2e2b700  1 osd.354 pg_epoch: 45920
> >> pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
> >> local-lis/les=45918/45919 n=30876 ec=874/871 lis/c 45918/45916 les/c/f
> >> 45919/45917/0 45920/45920/42988) [354,694,851] r=0 lpr=45920
> >> pi=[45916,45920)/1 luod=0'0 crt=45917'449808 lcod 45917'449807 mlcod 0'0
> >> active mbc={}] start_peering_interval up [354,851] -> [354,694,851],
> >> acting [354,851] -> [354,694,851], acting_primary 354 -> 354, up_primary
> >> 354 -> 354, role 0 -> 0, features acting 4611087854031667199 upacting
> >> 4611087854031667199
> >> 2020-04-18 09:13:44.959 7fffd2e2b700  1 osd.354 pg_epoch: 45920
> >> pg[1.43f( v 45917'449808 (45812'446725,45917'449808]
> >> local-lis/les=45918/45919 n=30876 ec=874/871 lis/c 45918/45916 les/c/f
> >> 45919/45917/0 45920/45920/42988) [354,694,851] r=0 lpr=45920
> >> pi=[45916,45920)/1 crt=45917'449808 lcod 45917'449807 mlcod 0'0 unknown
> >> mbc={}] state<Start>: transitioning to Primary
> >>
> >> I've tried restarting all 3 OSDs in question, it didn't help.  I'm
> >> running the upmap balancer on this cluster (if that matters).  Any ideas?
> >>
> >> Andras
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx