Hi,
could this be related to this other Mimic upgrade thread [1]? Your
failing MONs sound a bit like the problem described there, eventually
the user reported recovery success. You could try the described steps:
- disable cephx auth with 'auth_cluster_required = none'
- set the mon_osd_cache_size = 200000 (default 10)
- Setting 'osd_heartbeat_interval = 30'
- setting 'mon_lease = 75'
- increase the rocksdb_cache_size and leveldb_cache_size on the mons
to be big enough to cache the entire db
I just copied the mentioned steps, so please read the thread before
applying anything.
Regards,
Eugen
[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030018.html
Zitat von by morphin <morphinwithyou@xxxxxxxxx>:
> After I tried too many things with so many helps on IRC. My pool
> health is still in ERROR and I think I can't recover from this.
> https://paste.ubuntu.com/p/HbsFnfkYDT/
> At the end 2 of 3 mons crashed and started at same time and the pool
> is offlined. Recovery takes more than 12hours and it is way too slow.
> Somehow recovery seems to be not working.
>
> If I can reach my data I will re-create the pool easily.
> If I run ceph-object-tool script to regenerate mon store.db can I
> acccess the RBD pool again?
> by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 20:03
> tarihinde şunu yazdı:
>>
>> Hi,
>>
>> Cluster is still down :(
>>
>> Up to not we have managed to compensate the OSDs. 118s of 160 OSD are
>> stable and cluster is still in the progress of settling. Thanks for
>> the guy Be-El in the ceph IRC channel. Be-El helped a lot to make
>> flapping OSDs stable.
>>
>> What we learned up now is that this is the cause of unsudden death of
>> 2 monitor servers of 3. And when they come back if they do not start
>> one by one (each after joining cluster) this can happen. Cluster can
>> be unhealty and it can take countless hour to come back.
>>
>> Right now here is our status:
>> ceph -s : https://paste.ubuntu.com/p/6DbgqnGS7t/
>> health detail: https://paste.ubuntu.com/p/w4gccnqZjR/
>>
>> Since OSDs disks are NL-SAS it can take up to 24 hours for an online
>> cluster. What is most it has been said that we could be extremely luck
>> if all the data is rescued.
>>
>> Most unhappily our strategy is just to sit and wait :(. As soon as the
>> peering and activating count drops to 300-500 pgs we will restart the
>> stopped OSDs one by one. For each OSD and we will wait the cluster to
>> settle down. The amount of data stored is OSD is 33TB. Our most
>> concern is to export our rbd pool data outside to a backup space. Then
>> we will start again with clean one.
>>
>> I hope to justify our analysis with an expert. Any help or advise
>> would be greatly appreciated.
>> by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 15:08
>> tarihinde şunu yazdı:
>> >
>> > After reducing the recovery parameter values did not change much.
>> > There are a lot of OSD still marked down.
>> >
>> > I don't know what I need to do after this point.
>> >
>> > [osd]
>> > osd recovery op priority = 63
>> > osd client op priority = 1
>> > osd recovery max active = 1
>> > osd max scrubs = 1
>> >
>> >
>> > ceph -s
>> > cluster:
>> > id: 89569e73-eb89-41a4-9fc9-d2a5ec5f4106
>> > health: HEALTH_ERR
>> > 42 osds down
>> > 1 host (6 osds) down
>> > 61/8948582 objects unfound (0.001%)
>> > Reduced data availability: 3837 pgs inactive, 1822 pgs
>> > down, 1900 pgs peering, 6 pgs stale
>> > Possible data damage: 18 pgs recovery_unfound
>> > Degraded data redundancy: 457246/17897164 objects degraded
>> > (2.555%), 213 pgs degraded, 209 pgs undersized
>> > 2554 slow requests are blocked > 32 sec
>> > 3273 slow ops, oldest one blocked for 1453 sec, daemons
>> >
>>
[osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]...
>> > have slow ops.
>> >
>> > services:
>> > mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3
>> > mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3,
>> > SRV-SEKUARK4
>> > osd: 168 osds: 118 up, 160 in
>> >
>> > data:
>> > pools: 1 pools, 4096 pgs
>> > objects: 8.95 M objects, 17 TiB
>> > usage: 33 TiB used, 553 TiB / 586 TiB avail
>> > pgs: 93.677% pgs not active
>> > 457246/17897164 objects degraded (2.555%)
>> > 61/8948582 objects unfound (0.001%)
>> > 1676 down
>> > 1372 peering
>> > 528 stale+peering
>> > 164 active+undersized+degraded
>> > 145 stale+down
>> > 73 activating
>> > 40 active+clean
>> > 29 stale+activating
>> > 17 active+recovery_unfound+undersized+degraded
>> > 16 stale+active+clean
>> > 16 stale+active+undersized+degraded
>> > 9 activating+undersized+degraded
>> > 3 active+recovery_wait+degraded
>> > 2 activating+undersized
>> > 2 activating+degraded
>> > 1 creating+down
>> > 1 stale+active+recovery_unfound+undersized+degraded
>> > 1 stale+active+clean+scrubbing+deep
>> > 1 stale+active+recovery_wait+degraded
>> >
>> > ceph -w: https://paste.ubuntu.com/p/WZ2YqzS86S/
>> > ceph health detail: https://paste.ubuntu.com/p/8w7Jpms8fj/
>> > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 14:32
>> > tarihinde şunu yazdı:
>> > >
>> > > The config didnt work. Because increasing the number faced with
>> more OSD Drops.
>> > >
>> > > bhfs -s
>> > > cluster:
>> > > id: 89569e73-eb89-41a4-9fc9-d2a5ec5f4106
>> > > health: HEALTH_ERR
>> > > norebalance,norecover flag(s) set
>> > > 1 osds down
>> > > 17/8839434 objects unfound (0.000%)
>> > > Reduced data availability: 3578 pgs inactive, 861 pgs
>> > > down, 1928 pgs peering, 11 pgs stale
>> > > Degraded data redundancy: 44853/17678868 objects degraded
>> > > (0.254%), 221 pgs degraded, 20 pgs undersized
>> > > 610 slow requests are blocked > 32 sec
>> > > 3996 stuck requests are blocked > 4096 sec
>> > > 6076 slow ops, oldest one blocked for 4129 sec, daemons
>> > >
>>
[osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]...
>> > > have slow ops.
>> > >
>> > > services:
>> > > mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3
>> > > mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3
>> > > osd: 168 osds: 128 up, 129 in; 2 remapped pgs
>> > > flags norebalance,norecover
>> > >
>> > > data:
>> > > pools: 1 pools, 4096 pgs
>> > > objects: 8.84 M objects, 17 TiB
>> > > usage: 26 TiB used, 450 TiB / 477 TiB avail
>> > > pgs: 0.024% pgs unknown
>> > > 89.160% pgs not active
>> > > 44853/17678868 objects degraded (0.254%)
>> > > 17/8839434 objects unfound (0.000%)
>> > > 1612 peering
>> > > 720 down
>> > > 583 activating
>> > > 319 stale+peering
>> > > 255 active+clean
>> > > 157 stale+activating
>> > > 108 stale+down
>> > > 95 activating+degraded
>> > > 84 stale+active+clean
>> > > 50 active+recovery_wait+degraded
>> > > 29 creating+down
>> > > 23 stale+activating+degraded
>> > > 18 stale+active+recovery_wait+degraded
>> > > 14 active+undersized+degraded
>> > > 12 active+recovering+degraded
>> > > 4 stale+creating+down
>> > > 3 stale+active+recovering+degraded
>> > > 3 stale+active+undersized+degraded
>> > > 2 stale
>> > > 1 active+recovery_wait+undersized+degraded
>> > > 1 active+clean+scrubbing+deep
>> > > 1 unknown
>> > > 1 active+undersized+degraded+remapped+backfilling
>> > > 1 active+recovering+undersized+degraded
>> > >
>> > > I guess OSD down and drop issue increases the recovery time. So I
>> > > decided to try with decreasing recovery parameters for less load on
>> > > cluster.
>> > > I have Nvme and SAS disks. Servers are powerfull enough.
>> Network is 4x10Gb.
>> > > I dont think my cluster is a bad shape. Because I have datacenter
>> > > redundancy (14 servers + 14 servers). The crashed 7 servers are on
>> > > only datacenter A. And it took only a few minutes to back
online. Also
>> > > 2 of them is monitors and cluster I/O should be suspended so there
>> > > should be less data difference.
>> > >
>> > > On the other hand I dont understand the burden of recovery. I have
>> > > faced many recoverys but none of the stopped my cluster working. This
>> > > recovery burden is so high that it didnt stop for hours. I wish I
>> > > could just decrease the recovery speed and continue to serve my VMs.
>> > > Is the change of recovery load some what different than mimic?
>> > > Luminous was pretty fine indeed.
>> > > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 13:57
>> > > tarihinde şunu yazdı:
>> > > >
>> > > > Thank you for answer
>> > > >
>> > > > What do you think the conf for speed the recover?
>> > > >
>> > > > [osd]
>> > > > osd recovery op priority = 63
>> > > > osd client op priority = 1
>> > > > osd recovery max active = 16
>> > > > osd max scrubs = 16
>> > > > <admin@xxxxxxxxxxxxxxx> adresine sahip kullanıcı 25 Eyl 2018 Sal,
>> > > > 13:37 tarihinde şunu yazdı:
>> > > > >
>> > > > > Just let it recover.
>> > > > >
>> > > > > data:
>> > > > > pools: 1 pools, 4096 pgs
>> > > > > objects: 8.95 M objects, 17 TiB
>> > > > > usage: 34 TiB used, 577 TiB / 611 TiB avail
>> > > > > pgs: 94.873% pgs not active
>> > > > > 48475/17901254 objects degraded (0.271%)
>> > > > > 1/8950627 objects unfound (0.000%)
>> > > > > 2631 peering
>> > > > > 637 activating
>> > > > > 562 down
>> > > > > 159 active+clean
>> > > > > 44 activating+degraded
>> > > > > 30 active+recovery_wait+degraded
>> > > > > 12 activating+undersized+degraded
>> > > > > 10 active+recovering+degraded
>> > > > > 10 active+undersized+degraded
>> > > > > 1 active+clean+scrubbing+deep
>> > > > >
>> > > > > You've got deep scrubbed PGs which put considerable IO
load on OSDs.
>> > > > >
>> > > > >
>> > > > > September 25, 2018 1:23 PM, "by morphin"
>> <morphinwithyou@xxxxxxxxx> wrote:
>> > > > >
>> > > > >
>> > > > > > What should I do now?
>> > > > > >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com