Re: OSD is near full and slow in accessing storage from client

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sebastien

 Thanks for you reply , yes undersize pgs and recovery in process becuase of we added new osd after getting 2 OSD is near full warning .   Yes newly added osd is reblancing the size.


[root@intcfs-osd6 ~]# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
0 3.29749  1.00000  3376G  2875G  501G 85.15 1.26 165
1 3.26869  1.00000  3347G  1923G 1423G 57.46 0.85 152
2 3.27339  1.00000  3351G  1980G 1371G 59.08 0.88 161
3 3.24089  1.00000  3318G  2130G 1187G 64.21 0.95 168
4 3.24089  1.00000  3318G  2997G  320G 90.34 1.34 176
5 3.32669  1.00000  3406G  2466G  939G 72.42 1.07 165
6 3.27800  1.00000  3356G  1463G 1893G 43.60 0.65 166  

ceph osd crush rule dump

[
    {
        "rule_id": 0,
        "rule_name": "replicated_ruleset",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]


ceph version 10.2.2 and ceph version 10.2.9


ceph osd pool ls detail

pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 3 'downloads_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 250 pgp_num 250 last_change 39 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 4 'downloads_metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 250 pgp_num 250 last_change 36 flags hashpspool stripe_width 0


---- On Sun, 12 Nov 2017 15:04:02 +0530 Sébastien VIGNERON <sebastien.vigneron@xxxxxxxxx> wrote ----

Hi,

Can you share:
 - your placement rules: ceph osd crush rule dump
 - your CEPH version: ceph versions
 - your pools definitions: ceph osd pool ls detail

With these we can determine is your pgs are stuck because of a misconfiguration or something else.

You seems to have some undersized pgs and a recovery in process. Does your OSDs showed some rebalance of your datas? Does your OSDs use percentage change over time? (changes in "ceph osd df")

Cordialement / Best regards,

Sébastien VIGNERON 
CRIANN, 
Ingénieur / Engineer
Technopôle du Madrillet 
745, avenue de l'Université 
76800 Saint-Etienne du Rouvray - France 
tél. +33 2 32 91 42 91 
fax. +33 2 32 91 42 92 

Le 12 nov. 2017 à 10:04, gjprabu <gjprabu@xxxxxxxxxxxx> a écrit :

Hi Team,

         We have ceph setup with 6 OSD and we got alert with 2 OSD is near full . We faced issue like slow in accessing ceph from client. So i have added 7th OSD and still 2 OSD is showing near full ( OSD.0 and OSD.4) , I have restarted ceph service in osd.0 and osd.4 .  Kindly check the below ceph osd status and please provide us the solutions. 


# ceph health detail
HEALTH_WARN 46 pgs backfill_wait; 1 pgs backfilling; 32 pgs degraded; 50 pgs stuck unclean; 32 pgs undersized; recovery 1098780/40253637 objects degraded (2.730%); recovery 3401433/40253637 objects misplaced (8.450%); 2 near full osd(s); mds0: Client integ-hm3 failing to respond to cache pressure; mds0: Client integ-hm8 failing to respond to cache pressure; mds0: Client integ-hm2 failing to respond to cache pressure; mds0: Client integ-hm9 failing to respond to cache pressure; mds0: Client integ-hm5 failing to respond to cache pressure; mds0: Client integ-hm9-bkp failing to respond to cache pressure; mds0: Client me-build1-bkp failing to respond to cache pressure

pg 3.f6 is stuck unclean for 511223.069161, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 4.f6 is stuck unclean for 511232.770419, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 3.ec is stuck unclean for 510902.815668, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 3.eb is stuck unclean for 511285.576487, current state active+remapped+wait_backfill, last acting [3,0]
pg 4.17 is stuck unclean for 511235.326709, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 4.2f is stuck unclean for 511232.356371, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 4.3d is stuck unclean for 511300.446982, current state active+remapped, last acting [3,0]
pg 4.93 is stuck unclean for 511295.539229, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
pg 3.47 is stuck unclean for 511288.104965, current state active+remapped+wait_backfill, last acting [3,0]
pg 4.d5 is stuck unclean for 510916.509825, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 3.31 is stuck unclean for 511221.542878, current state active+remapped+wait_backfill, last acting [0,3]
pg 3.62 is stuck unclean for 511221.551662, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]
pg 4.4d is stuck unclean for 511232.279602, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 4.48 is stuck unclean for 510911.095367, current state active+remapped+wait_backfill, last acting [5,4]
pg 3.4f is stuck unclean for 511226.712285, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 3.78 is stuck unclean for 511221.531199, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 3.24 is stuck unclean for 510903.483324, current state active+remapped+backfilling, last acting [1,2]
pg 4.8c is stuck unclean for 511231.668693, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 3.b4 is stuck unclean for 511222.612012, current state active+undersized+degraded+remapped+wait_backfill, last acting [0]
pg 4.41 is stuck unclean for 511287.031264, current state active+remapped+wait_backfill, last acting [3,2]
pg 3.d1 is stuck unclean for 510903.797329, current state active+remapped+wait_backfill, last acting [0,3]
pg 3.7f is stuck unclean for 511222.929722, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 4.af is stuck unclean for 511262.494659, current state active+undersized+degraded+remapped, last acting [0]
pg 3.66 is stuck unclean for 510903.296711, current state active+remapped+wait_backfill, last acting [3,0]
pg 3.76 is stuck unclean for 511224.615144, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
pg 4.57 is stuck unclean for 511234.514343, current state active+remapped, last acting [0,4]
pg 3.69 is stuck unclean for 511224.672085, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]
pg 3.9a is stuck unclean for 510967.300000, current state active+remapped+wait_backfill, last acting [3,2]
pg 4.50 is stuck unclean for 510903.825565, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 4.53 is stuck unclean for 510921.975268, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 3.e7 is stuck unclean for 511221.530592, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 4.6a is stuck unclean for 510911.284877, current state active+undersized+degraded+remapped+wait_backfill, last acting [0]
pg 4.16 is stuck unclean for 511232.702762, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 3.2c is stuck unclean for 511222.443893, current state active+remapped+wait_backfill, last acting [2,3]
pg 4.89 is stuck unclean for 511228.846614, current state active+undersized+degraded+remapped+wait_backfill, last acting [4]
pg 4.39 is stuck unclean for 511239.544231, current state active+remapped+wait_backfill, last acting [3,2]
pg 4.ce is stuck unclean for 511232.294586, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 3.91 is stuck unclean for 511232.341380, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 3.96 is stuck unclean for 510904.043900, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 4.c0 is stuck unclean for 510904.253281, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 4.9c is stuck unclean for 511237.612850, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 3.ab is stuck unclean for 510960.756324, current state active+remapped+wait_backfill, last acting [3,2]
pg 4.aa is stuck unclean for 511229.307559, current state active+remapped+wait_backfill, last acting [0,3]
pg 3.ad is stuck unclean for 510903.764157, current state active+remapped+wait_backfill, last acting [0,3]
pg 3.b5 is stuck unclean for 511226.560774, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
pg 4.58 is stuck unclean for 510919.273667, current state active+undersized+degraded+remapped+wait_backfill, last acting [1]
pg 4.b9 is stuck unclean for 511232.760066, current state active+remapped+wait_backfill, last acting [5,4]
pg 3.be is stuck unclean for 511224.422931, current state active+remapped+wait_backfill, last acting [0,4]
pg 4.d4 is stuck unclean for 510962.810416, current state active+undersized+degraded+remapped+wait_backfill, last acting [3]
pg 4.da is stuck unclean for 511259.506962, current state active+undersized+degraded+remapped+wait_backfill, last acting [2]
pg 4.8c is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 3.7f is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 3.78 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.76 is active+undersized+degraded+remapped+wait_backfill, acting [3]
pg 4.6a is active+undersized+degraded+remapped+wait_backfill, acting [0]
pg 3.69 is active+undersized+degraded+remapped+wait_backfill, acting [4]
pg 3.66 is active+remapped+wait_backfill, acting [3,0]
pg 3.62 is active+undersized+degraded+remapped+wait_backfill, acting [4]
pg 4.58 is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 4.50 is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 4.53 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.4f is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 4.48 is active+remapped+wait_backfill, acting [5,4]
pg 4.4d is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.47 is active+remapped+wait_backfill, acting [3,0]
pg 4.41 is active+remapped+wait_backfill, acting [3,2]
pg 3.31 is active+remapped+wait_backfill, acting [0,3]
pg 4.2f is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.24 is active+remapped+backfilling, acting [1,2]
pg 4.17 is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 4.16 is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 3.2c is active+remapped+wait_backfill, acting [2,3]
pg 4.39 is active+remapped+wait_backfill, acting [3,2]
pg 4.89 is active+undersized+degraded+remapped+wait_backfill, acting [4]
pg 3.91 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 4.93 is active+undersized+degraded+remapped+wait_backfill, acting [3]
pg 3.96 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.9a is active+remapped+wait_backfill, acting [3,2]
pg 4.9c is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 4.af is active+undersized+degraded+remapped, acting [0]
pg 3.ab is active+remapped+wait_backfill, acting [3,2]
pg 4.aa is active+remapped+wait_backfill, acting [0,3]
pg 3.ad is active+remapped+wait_backfill, acting [0,3]
pg 3.b4 is active+undersized+degraded+remapped+wait_backfill, acting [0]
pg 3.b5 is active+undersized+degraded+remapped+wait_backfill, acting [3]
pg 4.b9 is active+remapped+wait_backfill, acting [5,4]
pg 3.be is active+remapped+wait_backfill, acting [0,4]
pg 4.c0 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 4.ce is active+undersized+degraded+remapped+wait_backfill, acting [1]
pg 3.d1 is active+remapped+wait_backfill, acting [0,3]
pg 4.d5 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 4.d4 is active+undersized+degraded+remapped+wait_backfill, acting [3]
pg 4.da is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.e7 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.eb is active+remapped+wait_backfill, acting [3,0]
pg 3.ec is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 4.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2]
pg 3.f6 is active+undersized+degraded+remapped+wait_backfill, acting [2]
recovery 1098780/40253637 objects degraded (2.730%)
recovery 3401433/40253637 objects misplaced (8.450%)
osd.0 is near full at 85%
osd.4 is near full at 90%
mds0: Client integ-hm3 failing to respond to cache pressure(client_id: 733998)
mds0: Client integ-hm8 failing to respond to cache pressure(client_id: 843866)
mds0: Client integ-hm2 failing to respond to cache pressure(client_id: 844939)
mds0: Client integ-hm9 failing to respond to cache pressure(client_id: 845065)
mds0: Client integ-hm5 failing to respond to cache pressure(client_id: 845068)
mds0: Client integ-hm9-bkp failing to respond to cache pressure(client_id: 895898)
mds0: Client me-build1-bkp failing to respond to cache pressure(client_id: 888666)


hm ~]# ceph osd tree
ID WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 22.92604 root default                                          
-2  3.29749     host intcfs-osd1                                  
0  3.29749         osd.0             up  1.00000          1.00000
-3  3.26869     host intcfs-osd2                                  
1  3.26869         osd.1             up  1.00000          1.00000
-4  3.27339     host intcfs-osd3                                  
2  3.27339         osd.2             up  1.00000          1.00000
-5  3.24089     host intcfs-osd4                                  
3  3.24089         osd.3             up  1.00000          1.00000
-6  3.24089     host intcfs-osd5                                  
4  3.24089         osd.4             up  1.00000          1.00000
-7  3.32669     host intcfs-osd6                                  
5  3.32669         osd.5             up  1.00000          1.00000
-8  3.27800     host intcfs-osd7                                  
6  3.27800         osd.6             up  1.00000          1.00000


hm5 ~]# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE    AVAIL %USE  VAR  PGS
0 3.29749  1.00000  3376G  2874G  502G 85.13 1.26 165
1 3.26869  1.00000  3347G  1922G 1424G 57.44 0.85 152
2 3.27339  1.00000  3351G  2009G 1342G 59.95 0.89 162
3 3.24089  1.00000  3318G  2130G 1188G 64.19 0.95 168
4 3.24089  1.00000  3318G  2996G  321G 90.30 1.34 176
5 3.32669  1.00000  3406G  2465G  940G 72.39 1.07 165
6 3.27800  1.00000  3356G  1435G 1921G 42.76 0.63 166
              TOTAL 23476G 15834G 7641G 67.45         
MIN/MAX VAR: 0.63/1.34  STDDEV: 15.29


Regards
Prabu GJ
_______________________________________________
ceph-users mailing list

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux