Hello Eugen. Thank you for your answer. I was loosing my hope to get an answer here. I faced so many times with losing 2/3 mons but I never faced any problem like this on luminous. The recovery still works and its have been 30hours. The last state of my cluster is: https://paste.ubuntu.com/p/rDNHCcNG7P/ We are discussing should we unset the nodown, norecover flags or not on IRC. I tried unset the nodown flag yesterday and I have 15 osd do not start anymore with same error --> : https://paste.ubuntu.com/p/94xpzxTSnr/ I dont know what is the reason of this but I saw some commits for the dump problem. Is this bug or something else? And can you check the plan "peetaur2" offered from IRC: https://bpaste.net/show/20581774ff08 Also Be_El strongly offers to unset nodown parameter. What do you think? Eugen Block <eblock@xxxxxx>, 26 Eyl 2018 Çar, 12:54 tarihinde şunu yazdı: > > Hi, > > could this be related to this other Mimic upgrade thread [1]? Your > failing MONs sound a bit like the problem described there, eventually > the user reported recovery success. You could try the described steps: > > - disable cephx auth with 'auth_cluster_required = none' > - set the mon_osd_cache_size = 200000 (default 10) > - Setting 'osd_heartbeat_interval = 30' > - setting 'mon_lease = 75' > - increase the rocksdb_cache_size and leveldb_cache_size on the mons > to be big enough to cache the entire db > > I just copied the mentioned steps, so please read the thread before > applying anything. > > Regards, > Eugen > > [1] > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030018.html > > > Zitat von by morphin <morphinwithyou@xxxxxxxxx>: > > > After I tried too many things with so many helps on IRC. My pool > > health is still in ERROR and I think I can't recover from this. > > https://paste.ubuntu.com/p/HbsFnfkYDT/ > > At the end 2 of 3 mons crashed and started at same time and the pool > > is offlined. Recovery takes more than 12hours and it is way too slow. > > Somehow recovery seems to be not working. > > > > If I can reach my data I will re-create the pool easily. > > If I run ceph-object-tool script to regenerate mon store.db can I > > acccess the RBD pool again? > > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 20:03 > > tarihinde şunu yazdı: > >> > >> Hi, > >> > >> Cluster is still down :( > >> > >> Up to not we have managed to compensate the OSDs. 118s of 160 OSD are > >> stable and cluster is still in the progress of settling. Thanks for > >> the guy Be-El in the ceph IRC channel. Be-El helped a lot to make > >> flapping OSDs stable. > >> > >> What we learned up now is that this is the cause of unsudden death of > >> 2 monitor servers of 3. And when they come back if they do not start > >> one by one (each after joining cluster) this can happen. Cluster can > >> be unhealty and it can take countless hour to come back. > >> > >> Right now here is our status: > >> ceph -s : https://paste.ubuntu.com/p/6DbgqnGS7t/ > >> health detail: https://paste.ubuntu.com/p/w4gccnqZjR/ > >> > >> Since OSDs disks are NL-SAS it can take up to 24 hours for an online > >> cluster. What is most it has been said that we could be extremely luck > >> if all the data is rescued. > >> > >> Most unhappily our strategy is just to sit and wait :(. As soon as the > >> peering and activating count drops to 300-500 pgs we will restart the > >> stopped OSDs one by one. For each OSD and we will wait the cluster to > >> settle down. The amount of data stored is OSD is 33TB. Our most > >> concern is to export our rbd pool data outside to a backup space. Then > >> we will start again with clean one. > >> > >> I hope to justify our analysis with an expert. Any help or advise > >> would be greatly appreciated. > >> by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 15:08 > >> tarihinde şunu yazdı: > >> > > >> > After reducing the recovery parameter values did not change much. > >> > There are a lot of OSD still marked down. > >> > > >> > I don't know what I need to do after this point. > >> > > >> > [osd] > >> > osd recovery op priority = 63 > >> > osd client op priority = 1 > >> > osd recovery max active = 1 > >> > osd max scrubs = 1 > >> > > >> > > >> > ceph -s > >> > cluster: > >> > id: 89569e73-eb89-41a4-9fc9-d2a5ec5f4106 > >> > health: HEALTH_ERR > >> > 42 osds down > >> > 1 host (6 osds) down > >> > 61/8948582 objects unfound (0.001%) > >> > Reduced data availability: 3837 pgs inactive, 1822 pgs > >> > down, 1900 pgs peering, 6 pgs stale > >> > Possible data damage: 18 pgs recovery_unfound > >> > Degraded data redundancy: 457246/17897164 objects degraded > >> > (2.555%), 213 pgs degraded, 209 pgs undersized > >> > 2554 slow requests are blocked > 32 sec > >> > 3273 slow ops, oldest one blocked for 1453 sec, daemons > >> > > >> [osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]... > >> > have slow ops. > >> > > >> > services: > >> > mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3 > >> > mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3, > >> > SRV-SEKUARK4 > >> > osd: 168 osds: 118 up, 160 in > >> > > >> > data: > >> > pools: 1 pools, 4096 pgs > >> > objects: 8.95 M objects, 17 TiB > >> > usage: 33 TiB used, 553 TiB / 586 TiB avail > >> > pgs: 93.677% pgs not active > >> > 457246/17897164 objects degraded (2.555%) > >> > 61/8948582 objects unfound (0.001%) > >> > 1676 down > >> > 1372 peering > >> > 528 stale+peering > >> > 164 active+undersized+degraded > >> > 145 stale+down > >> > 73 activating > >> > 40 active+clean > >> > 29 stale+activating > >> > 17 active+recovery_unfound+undersized+degraded > >> > 16 stale+active+clean > >> > 16 stale+active+undersized+degraded > >> > 9 activating+undersized+degraded > >> > 3 active+recovery_wait+degraded > >> > 2 activating+undersized > >> > 2 activating+degraded > >> > 1 creating+down > >> > 1 stale+active+recovery_unfound+undersized+degraded > >> > 1 stale+active+clean+scrubbing+deep > >> > 1 stale+active+recovery_wait+degraded > >> > > >> > ceph -w: https://paste.ubuntu.com/p/WZ2YqzS86S/ > >> > ceph health detail: https://paste.ubuntu.com/p/8w7Jpms8fj/ > >> > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 14:32 > >> > tarihinde şunu yazdı: > >> > > > >> > > The config didnt work. Because increasing the number faced with > >> more OSD Drops. > >> > > > >> > > bhfs -s > >> > > cluster: > >> > > id: 89569e73-eb89-41a4-9fc9-d2a5ec5f4106 > >> > > health: HEALTH_ERR > >> > > norebalance,norecover flag(s) set > >> > > 1 osds down > >> > > 17/8839434 objects unfound (0.000%) > >> > > Reduced data availability: 3578 pgs inactive, 861 pgs > >> > > down, 1928 pgs peering, 11 pgs stale > >> > > Degraded data redundancy: 44853/17678868 objects degraded > >> > > (0.254%), 221 pgs degraded, 20 pgs undersized > >> > > 610 slow requests are blocked > 32 sec > >> > > 3996 stuck requests are blocked > 4096 sec > >> > > 6076 slow ops, oldest one blocked for 4129 sec, daemons > >> > > > >> [osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]... > >> > > have slow ops. > >> > > > >> > > services: > >> > > mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3 > >> > > mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3 > >> > > osd: 168 osds: 128 up, 129 in; 2 remapped pgs > >> > > flags norebalance,norecover > >> > > > >> > > data: > >> > > pools: 1 pools, 4096 pgs > >> > > objects: 8.84 M objects, 17 TiB > >> > > usage: 26 TiB used, 450 TiB / 477 TiB avail > >> > > pgs: 0.024% pgs unknown > >> > > 89.160% pgs not active > >> > > 44853/17678868 objects degraded (0.254%) > >> > > 17/8839434 objects unfound (0.000%) > >> > > 1612 peering > >> > > 720 down > >> > > 583 activating > >> > > 319 stale+peering > >> > > 255 active+clean > >> > > 157 stale+activating > >> > > 108 stale+down > >> > > 95 activating+degraded > >> > > 84 stale+active+clean > >> > > 50 active+recovery_wait+degraded > >> > > 29 creating+down > >> > > 23 stale+activating+degraded > >> > > 18 stale+active+recovery_wait+degraded > >> > > 14 active+undersized+degraded > >> > > 12 active+recovering+degraded > >> > > 4 stale+creating+down > >> > > 3 stale+active+recovering+degraded > >> > > 3 stale+active+undersized+degraded > >> > > 2 stale > >> > > 1 active+recovery_wait+undersized+degraded > >> > > 1 active+clean+scrubbing+deep > >> > > 1 unknown > >> > > 1 active+undersized+degraded+remapped+backfilling > >> > > 1 active+recovering+undersized+degraded > >> > > > >> > > I guess OSD down and drop issue increases the recovery time. So I > >> > > decided to try with decreasing recovery parameters for less load on > >> > > cluster. > >> > > I have Nvme and SAS disks. Servers are powerfull enough. > >> Network is 4x10Gb. > >> > > I dont think my cluster is a bad shape. Because I have datacenter > >> > > redundancy (14 servers + 14 servers). The crashed 7 servers are on > >> > > only datacenter A. And it took only a few minutes to back online. Also > >> > > 2 of them is monitors and cluster I/O should be suspended so there > >> > > should be less data difference. > >> > > > >> > > On the other hand I dont understand the burden of recovery. I have > >> > > faced many recoverys but none of the stopped my cluster working. This > >> > > recovery burden is so high that it didnt stop for hours. I wish I > >> > > could just decrease the recovery speed and continue to serve my VMs. > >> > > Is the change of recovery load some what different than mimic? > >> > > Luminous was pretty fine indeed. > >> > > by morphin <morphinwithyou@xxxxxxxxx>, 25 Eyl 2018 Sal, 13:57 > >> > > tarihinde şunu yazdı: > >> > > > > >> > > > Thank you for answer > >> > > > > >> > > > What do you think the conf for speed the recover? > >> > > > > >> > > > [osd] > >> > > > osd recovery op priority = 63 > >> > > > osd client op priority = 1 > >> > > > osd recovery max active = 16 > >> > > > osd max scrubs = 16 > >> > > > <admin@xxxxxxxxxxxxxxx> adresine sahip kullanıcı 25 Eyl 2018 Sal, > >> > > > 13:37 tarihinde şunu yazdı: > >> > > > > > >> > > > > Just let it recover. > >> > > > > > >> > > > > data: > >> > > > > pools: 1 pools, 4096 pgs > >> > > > > objects: 8.95 M objects, 17 TiB > >> > > > > usage: 34 TiB used, 577 TiB / 611 TiB avail > >> > > > > pgs: 94.873% pgs not active > >> > > > > 48475/17901254 objects degraded (0.271%) > >> > > > > 1/8950627 objects unfound (0.000%) > >> > > > > 2631 peering > >> > > > > 637 activating > >> > > > > 562 down > >> > > > > 159 active+clean > >> > > > > 44 activating+degraded > >> > > > > 30 active+recovery_wait+degraded > >> > > > > 12 activating+undersized+degraded > >> > > > > 10 active+recovering+degraded > >> > > > > 10 active+undersized+degraded > >> > > > > 1 active+clean+scrubbing+deep > >> > > > > > >> > > > > You've got deep scrubbed PGs which put considerable IO load on OSDs. > >> > > > > > >> > > > > > >> > > > > September 25, 2018 1:23 PM, "by morphin" > >> <morphinwithyou@xxxxxxxxx> wrote: > >> > > > > > >> > > > > > >> > > > > > What should I do now? > >> > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com