Re: Problems with CephFS

Vadim Bulst <vadim.bulst@xxxxxxxxxxxxxx> · Tue, 12 Jun 2018 22:34:24 +0200

    Well Herbert,
    as Paul mentioned. You should reconfigure the threshold of your
      osds first and reweight second. Paul has sent you some hints.
    Jewel Documentation:
    http://docs.ceph.com/docs/jewel/rados/

    osd backfill full ratio

          Description:
          Refuse to accept backfill requests when
            the Ceph OSD Daemon’s
            full ratio is above this value.

          Type:
          Float

          Default:
          0.85

    You could put this into your config
      with an value of 0.9  on all osd-servers and restart the
      osd-daemons. Don't forget "ceph osd set noout".

      After restarting the daemons "ceph osd unset noout" resync should
      take place instandly. Now set reweight on osd 1,0,2 to a value
      like 0.9.

      "ceph osd reweight 1 0.9" and so on. 

      Herbert, you really should extend your cluster! And Or evacuate
      your data and rebuild it from scratch. 

      Cheers,

      Vadim

      On 12.06.2018 16:42, Steininger, Herbert wrote:

      Hi,

Thanks Guys for your Answers.

'ceph osd df' gives me:
[root@pcl241 ceph]# ceph osd df
ID WEIGHT   REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
 1 18.18999  1.00000 18625G 15705G  2919G 84.32 1.04 152
 0 18.18999  1.00000 18625G 15945G  2680G 85.61 1.06 165
 3 18.18999  1.00000 18625G 14755G  3870G 79.22 0.98 162
 4 18.18999  1.00000 18625G 14503G  4122G 77.87 0.96 158
 2 18.18999  1.00000 18625G 15965G  2660G 85.72 1.06 165
 5 18.18999  1.00000 21940G 16054G  5886G 73.17 0.91 159
               TOTAL   112T 92929G 22139G 80.76
MIN/MAX VAR: 0.91/1.06  STDDEV: 4.64

And 

[root@pcl241 ceph]# ceph osd df tree
ID  WEIGHT    REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS TYPE NAME
 -1 109.13992        -      0      0      0     0    0   0 root default
 -2         0        -      0      0      0     0    0   0     host A1214-2950-01
 -3         0        -      0      0      0     0    0   0     host A1214-2950-02
 -4         0        -      0      0      0     0    0   0     host A1214-2950-04
 -5         0        -      0      0      0     0    0   0     host A1214-2950-05
 -6         0        -      0      0      0     0    0   0     host A1214-2950-03
 -7  18.18999        - 18625G 15705G  2919G 84.32 1.04   0     host cuda002
  1  18.18999  1.00000 18625G 15705G  2919G 84.32 1.04 152         osd.1
 -8  18.18999        - 18625G 15945G  2680G 85.61 1.06   0     host cuda001
  0  18.18999  1.00000 18625G 15945G  2680G 85.61 1.06 165         osd.0
 -9  18.18999        - 18625G 14755G  3870G 79.22 0.98   0     host cuda005
  3  18.18999  1.00000 18625G 14755G  3870G 79.22 0.98 162         osd.3
-10  18.18999        - 18625G 14503G  4122G 77.87 0.96   0     host cuda003
  4  18.18999  1.00000 18625G 14503G  4122G 77.87 0.96 158         osd.4
-11  18.18999        - 18625G 15965G  2660G 85.72 1.06   0     host cuda004
  2  18.18999  1.00000 18625G 15965G  2660G 85.72 1.06 165         osd.2
-12  18.18999        - 21940G 16054G  5886G 73.17 0.91   0     host A1214-2950-06
  5  18.18999  1.00000 21940G 16054G  5886G 73.17 0.91 159         osd.5
-13         0        -      0      0      0     0    0   0     host pe9
                 TOTAL   112T 92929G 22139G 80.76
MIN/MAX VAR: 0.91/1.06  STDDEV: 4.64
[root@pcl241 ceph]#

Is it wise to reduce the weight?
Thanks,
Best,
Herbert

-----Ursprüngliche Nachricht-----
Von: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] Im Auftrag von Vadim Bulst
Gesendet: Dienstag, 12. Juni 2018 11:16
An: ceph-users@xxxxxxxxxxxxxx
Betreff: Re:  Problems with CephFS

Hi Herbert,

could you please run "ceph osd df"?

Cheers,

Vadim

On 12.06.2018 11:06, Steininger, Herbert wrote:

        Hi Guys,

i've inherited a CephFS-Cluster, I'm fairly new to CephFS.
The Cluster was down and I managed somehow to bring it up again.
But now there are some Problems that I can't fix that easily.
This is what 'ceph -s' is giving me as Info:
[root@pcl241 ceph]# ceph -s
     cluster cde1487e-f930-417a-9403-28e9ebf406b8
      health HEALTH_WARN
             2 pgs backfill_toofull
             1 pgs degraded
             1 pgs stuck degraded
             2 pgs stuck unclean
             1 pgs stuck undersized
             1 pgs undersized
             recovery 260/29731463 objects degraded (0.001%)
             recovery 798/29731463 objects misplaced (0.003%)
             2 near full osd(s)
             crush map has legacy tunables (require bobtail, min is firefly)
             crush map has straw_calc_version=0
      monmap e8: 3 mons at {cephcontrol=172.22.12.241:6789/0,slurmbackup=172.22.20.4:6789/0,slurmmaster=172.22.20.3:6789/0}
             election epoch 48, quorum 0,1,2 cephcontrol,slurmmaster,slurmbackup
       fsmap e2288: 1/1/1 up {0=pcl241=up:active}
      osdmap e10865: 6 osds: 6 up, 6 in; 2 remapped pgs
             flags nearfull
       pgmap v14103169: 320 pgs, 3 pools, 30899 GB data, 9678 kobjects
             92929 GB used, 22139 GB / 112 TB avail
             260/29731463 objects degraded (0.001%)
             798/29731463 objects misplaced (0.003%)
                  316 active+clean
                    2 active+clean+scrubbing+deep
                    1 active+undersized+degraded+remapped+backfill_toofull
                    1 active+remapped+backfill_toofull
[root@pcl241 ceph]#

[root@pcl241 ceph]# ceph osd tree
ID  WEIGHT    TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
  -1 109.13992 root default
  -2         0     host A1214-2950-01
  -3         0     host A1214-2950-02
  -4         0     host A1214-2950-04
  -5         0     host A1214-2950-05
  -6         0     host A1214-2950-03
  -7  18.18999     host cuda002
   1  18.18999         osd.1               up  1.00000          1.00000
  -8  18.18999     host cuda001
   0  18.18999         osd.0               up  1.00000          1.00000
  -9  18.18999     host cuda005
   3  18.18999         osd.3               up  1.00000          1.00000
-10  18.18999     host cuda003
   4  18.18999         osd.4               up  1.00000          1.00000
-11  18.18999     host cuda004
   2  18.18999         osd.2               up  1.00000          1.00000
-12  18.18999     host A1214-2950-06
   5  18.18999         osd.5               up  1.00000          1.00000
-13         0     host pe9

Could someone please put me in the right Direction about what to do to fix the Problems?
It seems that two OSD are full, but how can I solve that, if I don't have additionally hardware available?
Also it seems that the Cluster has different ceph-versions running (Hammer and Jewel), how to solve that?
Ceph-(mds/-mon/-osd) is running on Scientific Linux.
If more Info is needed, just let me know.

Thanks in Advance,
Steininger Herbert

---
Herbert Steininger
Leiter EDV
Administrator
Max-Planck-Institut für Psychiatrie - EDV Kraepelinstr.  2-10
80804 München
Tel      +49 (0)89 / 30622-368
Mail   herbert_steininger@xxxxxxxxxxxx Web  http://www.psych.mpg.de

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

      --
Vadim Bulst

Universität Leipzig / URZ
04109  Leipzig, Augustusplatz 10

phone: ++49-341-97-33380
mail:    vadim.bulst@xxxxxxxxxxxxxx

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    -- 
--- 
Vadim Bulst

Universität Leipzig / URZ
04109  Leipzig, Augustusplatz 10

phone: +49-341-97-33380
mail:    vadim.bulst@xxxxxxxxxxxxxx

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com