Re: Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

by morphin <morphinwithyou@xxxxxxxxx> · Wed, 26 Sep 2018 13:23:34 +0300

Hello Eugen.  Thank you for your answer. I was loosing my hope to get
an answer here.

I faced so many times with losing 2/3 mons but I never faced any
problem like this on luminous.
The recovery still works and its have been 30hours.  The last state of
my cluster is: https://paste.ubuntu.com/p/rDNHCcNG7P/
We are discussing should we unset the nodown, norecover flags or not on IRC.

I tried unset the nodown flag yesterday and I have 15 osd do not start
anymore with same error --> : https://paste.ubuntu.com/p/94xpzxTSnr/
I dont know what is the reason of this but I saw some commits for the
dump problem. Is this bug or something else?

And can you check the plan "peetaur2" offered from IRC:
https://bpaste.net/show/20581774ff08
Also Be_El strongly offers to unset nodown parameter.
What do you think?
Eugen Block <eblock@xxxxxx>, 26 Eyl 2018 Çar, 12:54 tarihinde şunu yazdı:
>
> Hi,
>
> could this be related to this other Mimic upgrade thread [1]? Your
> failing MONs sound a bit like the problem described there, eventually
> the user reported recovery success. You could try the described steps:
>
>   - disable cephx auth with 'auth_cluster_required = none'
>   - set the mon_osd_cache_size = 200000 (default 10)
>   - Setting 'osd_heartbeat_interval = 30'
>   - setting 'mon_lease = 75'
>   - increase the rocksdb_cache_size and leveldb_cache_size on the mons
> to be big enough to cache the entire db
>
> I just copied the mentioned steps, so please read the thread before
> applying anything.
>
> Regards,
> Eugen
>
> [1]
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030018.html
>
>
> Zitat von by morphin <morphinwithyou@xxxxxxxxx>:
>
> > After I tried too many things with so many helps on IRC. My pool
> > health is still in ERROR and I think I can't recover from this.
> > https://paste.ubuntu.com/p/HbsFnfkYDT/
> > At the end 2 of 3 mons crashed and started at same time and the pool
> > is offlined. Recovery takes more than 12hours and it is way too slow.
> > Somehow recovery seems to be not working.
> >
> > If I can reach my data I will re-create the pool easily.
> > If I run ceph-object-tool script to regenerate mon store.db can I
> > acccess the RBD pool again?
> > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 20:03
> > tarihinde şunu yazdı:
> >>
> >> Hi,
> >>
> >> Cluster is still down :(
> >>
> >> Up to not we have managed to compensate the OSDs. 118s of 160 OSD are
> >> stable and cluster is still in the progress of settling. Thanks for
> >> the guy Be-El in the ceph IRC channel. Be-El helped a lot to make
> >> flapping OSDs stable.
> >>
> >> What we learned up now is that this is the cause of unsudden death of
> >> 2 monitor servers of 3. And when they come back if they do not start
> >> one by one (each after joining cluster) this can happen. Cluster can
> >> be unhealty and it can take countless hour to come back.
> >>
> >> Right now here is our status:
> >> ceph -s : https://paste.ubuntu.com/p/6DbgqnGS7t/
> >> health detail: https://paste.ubuntu.com/p/w4gccnqZjR/
> >>
> >> Since OSDs disks are NL-SAS it can take up to 24 hours for an online
> >> cluster. What is most it has been said that we could be extremely luck
> >> if all the data is rescued.
> >>
> >> Most unhappily our strategy is just to sit and wait :(. As soon as the
> >> peering and activating count drops to 300-500 pgs we will restart the
> >> stopped OSDs one by one. For each OSD and we will wait the cluster to
> >> settle down. The amount of data stored is OSD is 33TB. Our most
> >> concern is to export our rbd pool data outside to a backup space. Then
> >> we will start again with clean one.
> >>
> >> I hope to justify our analysis with an expert. Any help or advise
> >> would be greatly appreciated.
> >> by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 15:08
> >> tarihinde şunu yazdı:
> >> >
> >> > After reducing the recovery parameter values did not change much.
> >> > There are a lot of OSD still marked down.
> >> >
> >> > I don't know what I need to do after this point.
> >> >
> >> > [osd]
> >> > osd recovery op priority = 63
> >> > osd client op priority = 1
> >> > osd recovery max active = 1
> >> > osd max scrubs = 1
> >> >
> >> >
> >> > ceph -s
> >> >   cluster:
> >> >     id:     89569e73-eb89-41a4-9fc9-d2a5ec5f4106
> >> >     health: HEALTH_ERR
> >> >             42 osds down
> >> >             1 host (6 osds) down
> >> >             61/8948582 objects unfound (0.001%)
> >> >             Reduced data availability: 3837 pgs inactive, 1822 pgs
> >> > down, 1900 pgs peering, 6 pgs stale
> >> >             Possible data damage: 18 pgs recovery_unfound
> >> >             Degraded data redundancy: 457246/17897164 objects degraded
> >> > (2.555%), 213 pgs degraded, 209 pgs undersized
> >> >             2554 slow requests are blocked > 32 sec
> >> >             3273 slow ops, oldest one blocked for 1453 sec, daemons
> >> >
> >> [osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]...
> >> > have slow ops.
> >> >
> >> >   services:
> >> >     mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3
> >> >     mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3,
> >> > SRV-SEKUARK4
> >> >     osd: 168 osds: 118 up, 160 in
> >> >
> >> >   data:
> >> >     pools:   1 pools, 4096 pgs
> >> >     objects: 8.95 M objects, 17 TiB
> >> >     usage:   33 TiB used, 553 TiB / 586 TiB avail
> >> >     pgs:     93.677% pgs not active
> >> >              457246/17897164 objects degraded (2.555%)
> >> >              61/8948582 objects unfound (0.001%)
> >> >              1676 down
> >> >              1372 peering
> >> >              528  stale+peering
> >> >              164  active+undersized+degraded
> >> >              145  stale+down
> >> >              73   activating
> >> >              40   active+clean
> >> >              29   stale+activating
> >> >              17   active+recovery_unfound+undersized+degraded
> >> >              16   stale+active+clean
> >> >              16   stale+active+undersized+degraded
> >> >              9    activating+undersized+degraded
> >> >              3    active+recovery_wait+degraded
> >> >              2    activating+undersized
> >> >              2    activating+degraded
> >> >              1    creating+down
> >> >              1    stale+active+recovery_unfound+undersized+degraded
> >> >              1    stale+active+clean+scrubbing+deep
> >> >              1    stale+active+recovery_wait+degraded
> >> >
> >> > ceph -w: https://paste.ubuntu.com/p/WZ2YqzS86S/
> >> > ceph health detail: https://paste.ubuntu.com/p/8w7Jpms8fj/
> >> > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 14:32
> >> > tarihinde şunu yazdı:
> >> > >
> >> > > The config didnt work. Because increasing the number faced with
> >> more OSD Drops.
> >> > >
> >> > > bhfs -s
> >> > >   cluster:
> >> > >     id:     89569e73-eb89-41a4-9fc9-d2a5ec5f4106
> >> > >     health: HEALTH_ERR
> >> > >             norebalance,norecover flag(s) set
> >> > >             1 osds down
> >> > >             17/8839434 objects unfound (0.000%)
> >> > >             Reduced data availability: 3578 pgs inactive, 861 pgs
> >> > > down, 1928 pgs peering, 11 pgs stale
> >> > >             Degraded data redundancy: 44853/17678868 objects degraded
> >> > > (0.254%), 221 pgs degraded, 20 pgs undersized
> >> > >             610 slow requests are blocked > 32 sec
> >> > >             3996 stuck requests are blocked > 4096 sec
> >> > >             6076 slow ops, oldest one blocked for 4129 sec, daemons
> >> > >
> >> [osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]...
> >> > > have slow ops.
> >> > >
> >> > >   services:
> >> > >     mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3
> >> > >     mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3
> >> > >     osd: 168 osds: 128 up, 129 in; 2 remapped pgs
> >> > >          flags norebalance,norecover
> >> > >
> >> > >   data:
> >> > >     pools:   1 pools, 4096 pgs
> >> > >     objects: 8.84 M objects, 17 TiB
> >> > >     usage:   26 TiB used, 450 TiB / 477 TiB avail
> >> > >     pgs:     0.024% pgs unknown
> >> > >              89.160% pgs not active
> >> > >              44853/17678868 objects degraded (0.254%)
> >> > >              17/8839434 objects unfound (0.000%)
> >> > >              1612 peering
> >> > >              720  down
> >> > >              583  activating
> >> > >              319  stale+peering
> >> > >              255  active+clean
> >> > >              157  stale+activating
> >> > >              108  stale+down
> >> > >              95   activating+degraded
> >> > >              84   stale+active+clean
> >> > >              50   active+recovery_wait+degraded
> >> > >              29   creating+down
> >> > >              23   stale+activating+degraded
> >> > >              18   stale+active+recovery_wait+degraded
> >> > >              14   active+undersized+degraded
> >> > >              12   active+recovering+degraded
> >> > >              4    stale+creating+down
> >> > >              3    stale+active+recovering+degraded
> >> > >              3    stale+active+undersized+degraded
> >> > >              2    stale
> >> > >              1    active+recovery_wait+undersized+degraded
> >> > >              1    active+clean+scrubbing+deep
> >> > >              1    unknown
> >> > >              1    active+undersized+degraded+remapped+backfilling
> >> > >              1    active+recovering+undersized+degraded
> >> > >
> >> > > I guess OSD down and drop issue increases the recovery time. So I
> >> > > decided to try with decreasing recovery parameters for less load on
> >> > > cluster.
> >> > > I have Nvme and SAS disks. Servers are powerfull enough.
> >> Network is 4x10Gb.
> >> > > I dont think my cluster is a bad shape. Because I have datacenter
> >> > > redundancy (14 servers + 14 servers). The crashed 7 servers are on
> >> > > only datacenter A. And it took only a few minutes to back online. Also
> >> > > 2 of them is monitors and cluster I/O should be suspended so there
> >> > > should be less data difference.
> >> > >
> >> > > On the other hand I dont understand the burden of recovery. I have
> >> > > faced many recoverys but none of the stopped my cluster working. This
> >> > > recovery burden is so high that it didnt stop for hours. I wish I
> >> > > could just decrease the recovery speed and continue to serve my VMs.
> >> > > Is the change of recovery load some what different than mimic?
> >> > > Luminous was pretty fine indeed.
> >> > > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 13:57
> >> > > tarihinde şunu yazdı:
> >> > > >
> >> > > > Thank you for answer
> >> > > >
> >> > > > What do you think the conf for speed the recover?
> >> > > >
> >> > > > [osd]
> >> > > > osd recovery op priority = 63
> >> > > > osd client op priority = 1
> >> > > > osd recovery max active = 16
> >> > > > osd max scrubs = 16
> >> > > > <admin@xxxxxxxxxxxxxxx> adresine sahip kullanıcı 25 Eyl 2018 Sal,
> >> > > > 13:37 tarihinde şunu yazdı:
> >> > > > >
> >> > > > > Just let it recover.
> >> > > > >
> >> > > > >   data:
> >> > > > >     pools:   1 pools, 4096 pgs
> >> > > > >     objects: 8.95 M objects, 17 TiB
> >> > > > >     usage:   34 TiB used, 577 TiB / 611 TiB avail
> >> > > > >     pgs:     94.873% pgs not active
> >> > > > >              48475/17901254 objects degraded (0.271%)
> >> > > > >              1/8950627 objects unfound (0.000%)
> >> > > > >              2631 peering
> >> > > > >              637  activating
> >> > > > >              562  down
> >> > > > >              159  active+clean
> >> > > > >              44   activating+degraded
> >> > > > >              30   active+recovery_wait+degraded
> >> > > > >              12   activating+undersized+degraded
> >> > > > >              10   active+recovering+degraded
> >> > > > >              10   active+undersized+degraded
> >> > > > >              1    active+clean+scrubbing+deep
> >> > > > >
> >> > > > > You've got deep scrubbed PGs which put considerable IO load on OSDs.
> >> > > > >
> >> > > > >
> >> > > > > September 25, 2018 1:23 PM, "by morphin"
> >> <morphinwithyou@xxxxxxxxx> wrote:
> >> > > > >
> >> > > > >
> >> > > > > > What should I do now?
> >> > > > > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com