Well Herbert, as Paul mentioned. You should reconfigure the threshold of your osds first and reweight second. Paul has sent you some hints. Jewel Documentation: http://docs.ceph.com/docs/jewel/rados/
You could put this into your config
with an value of 0.9 on all osd-servers and restart the
osd-daemons. Don't forget "ceph osd set noout".
After restarting the daemons "ceph osd unset noout" resync should take place instandly. Now set reweight on osd 1,0,2 to a value like 0.9. "ceph osd reweight 1 0.9" and so on. Herbert, you really should extend your cluster! And Or evacuate your data and rebuild it from scratch. Cheers, Vadim On 12.06.2018 16:42, Steininger, Herbert wrote: Hi, Thanks Guys for your Answers. 'ceph osd df' gives me: [root@pcl241 ceph]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 1 18.18999 1.00000 18625G 15705G 2919G 84.32 1.04 152 0 18.18999 1.00000 18625G 15945G 2680G 85.61 1.06 165 3 18.18999 1.00000 18625G 14755G 3870G 79.22 0.98 162 4 18.18999 1.00000 18625G 14503G 4122G 77.87 0.96 158 2 18.18999 1.00000 18625G 15965G 2660G 85.72 1.06 165 5 18.18999 1.00000 21940G 16054G 5886G 73.17 0.91 159 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 And [root@pcl241 ceph]# ceph osd df tree ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME -1 109.13992 - 0 0 0 0 0 0 root default -2 0 - 0 0 0 0 0 0 host A1214-2950-01 -3 0 - 0 0 0 0 0 0 host A1214-2950-02 -4 0 - 0 0 0 0 0 0 host A1214-2950-04 -5 0 - 0 0 0 0 0 0 host A1214-2950-05 -6 0 - 0 0 0 0 0 0 host A1214-2950-03 -7 18.18999 - 18625G 15705G 2919G 84.32 1.04 0 host cuda002 1 18.18999 1.00000 18625G 15705G 2919G 84.32 1.04 152 osd.1 -8 18.18999 - 18625G 15945G 2680G 85.61 1.06 0 host cuda001 0 18.18999 1.00000 18625G 15945G 2680G 85.61 1.06 165 osd.0 -9 18.18999 - 18625G 14755G 3870G 79.22 0.98 0 host cuda005 3 18.18999 1.00000 18625G 14755G 3870G 79.22 0.98 162 osd.3 -10 18.18999 - 18625G 14503G 4122G 77.87 0.96 0 host cuda003 4 18.18999 1.00000 18625G 14503G 4122G 77.87 0.96 158 osd.4 -11 18.18999 - 18625G 15965G 2660G 85.72 1.06 0 host cuda004 2 18.18999 1.00000 18625G 15965G 2660G 85.72 1.06 165 osd.2 -12 18.18999 - 21940G 16054G 5886G 73.17 0.91 0 host A1214-2950-06 5 18.18999 1.00000 21940G 16054G 5886G 73.17 0.91 159 osd.5 -13 0 - 0 0 0 0 0 0 host pe9 TOTAL 112T 92929G 22139G 80.76 MIN/MAX VAR: 0.91/1.06 STDDEV: 4.64 [root@pcl241 ceph]# Is it wise to reduce the weight? Thanks, Best, Herbert -----Ursprüngliche Nachricht----- Von: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] Im Auftrag von Vadim Bulst Gesendet: Dienstag, 12. Juni 2018 11:16 An: ceph-users@xxxxxxxxxxxxxx Betreff: Re: Problems with CephFS Hi Herbert, could you please run "ceph osd df"? Cheers, Vadim On 12.06.2018 11:06, Steininger, Herbert wrote:Hi Guys, i've inherited a CephFS-Cluster, I'm fairly new to CephFS. The Cluster was down and I managed somehow to bring it up again. But now there are some Problems that I can't fix that easily. This is what 'ceph -s' is giving me as Info: [root@pcl241 ceph]# ceph -s cluster cde1487e-f930-417a-9403-28e9ebf406b8 health HEALTH_WARN 2 pgs backfill_toofull 1 pgs degraded 1 pgs stuck degraded 2 pgs stuck unclean 1 pgs stuck undersized 1 pgs undersized recovery 260/29731463 objects degraded (0.001%) recovery 798/29731463 objects misplaced (0.003%) 2 near full osd(s) crush map has legacy tunables (require bobtail, min is firefly) crush map has straw_calc_version=0 monmap e8: 3 mons at {cephcontrol=172.22.12.241:6789/0,slurmbackup=172.22.20.4:6789/0,slurmmaster=172.22.20.3:6789/0} election epoch 48, quorum 0,1,2 cephcontrol,slurmmaster,slurmbackup fsmap e2288: 1/1/1 up {0=pcl241=up:active} osdmap e10865: 6 osds: 6 up, 6 in; 2 remapped pgs flags nearfull pgmap v14103169: 320 pgs, 3 pools, 30899 GB data, 9678 kobjects 92929 GB used, 22139 GB / 112 TB avail 260/29731463 objects degraded (0.001%) 798/29731463 objects misplaced (0.003%) 316 active+clean 2 active+clean+scrubbing+deep 1 active+undersized+degraded+remapped+backfill_toofull 1 active+remapped+backfill_toofull [root@pcl241 ceph]# [root@pcl241 ceph]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 109.13992 root default -2 0 host A1214-2950-01 -3 0 host A1214-2950-02 -4 0 host A1214-2950-04 -5 0 host A1214-2950-05 -6 0 host A1214-2950-03 -7 18.18999 host cuda002 1 18.18999 osd.1 up 1.00000 1.00000 -8 18.18999 host cuda001 0 18.18999 osd.0 up 1.00000 1.00000 -9 18.18999 host cuda005 3 18.18999 osd.3 up 1.00000 1.00000 -10 18.18999 host cuda003 4 18.18999 osd.4 up 1.00000 1.00000 -11 18.18999 host cuda004 2 18.18999 osd.2 up 1.00000 1.00000 -12 18.18999 host A1214-2950-06 5 18.18999 osd.5 up 1.00000 1.00000 -13 0 host pe9 Could someone please put me in the right Direction about what to do to fix the Problems? It seems that two OSD are full, but how can I solve that, if I don't have additionally hardware available? Also it seems that the Cluster has different ceph-versions running (Hammer and Jewel), how to solve that? Ceph-(mds/-mon/-osd) is running on Scientific Linux. If more Info is needed, just let me know. Thanks in Advance, Steininger Herbert --- Herbert Steininger Leiter EDV Administrator Max-Planck-Institut für Psychiatrie - EDV Kraepelinstr. 2-10 80804 München Tel +49 (0)89 / 30622-368 Mail herbert_steininger@xxxxxxxxxxxx Web http://www.psych.mpg.de _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com-- Vadim Bulst Universität Leipzig / URZ 04109 Leipzig, Augustusplatz 10 phone: ++49-341-97-33380 mail: vadim.bulst@xxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- --- Vadim Bulst Universität Leipzig / URZ 04109 Leipzig, Augustusplatz 10 phone: +49-341-97-33380 mail: vadim.bulst@xxxxxxxxxxxxxx |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com