I am not sure about "incomplete" part out of my head, but you can try
setting min_size to 1 for pools toreactivate some PG, if they are
down/inactive due to missing replicas.
On 17-01-31 10:24, José M. Martín wrote:
# ceph -s
cluster 29a91870-2ed2-40dc-969e-07b22f37928b
health HEALTH_ERR
clock skew detected on mon.loki04
155 pgs are stuck inactive for more than 300 seconds
7 pgs backfill_toofull
1028 pgs backfill_wait
48 pgs backfilling
892 pgs degraded
20 pgs down
153 pgs incomplete
2 pgs peering
155 pgs stuck inactive
1077 pgs stuck unclean
892 pgs undersized
1471 requests are blocked > 32 sec
recovery 3195781/36460868 objects degraded (8.765%)
recovery 5079026/36460868 objects misplaced (13.930%)
mds0: Behind on trimming (175/30)
noscrub,nodeep-scrub flag(s) set
Monitor clock skew detected
monmap e5: 5 mons at
{loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
election epoch 4028, quorum 0,1,2,3,4
loki01,loki02,loki03,loki04,loki05
fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
flags noscrub,nodeep-scrub
pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
45892 GB used, 34024 GB / 79916 GB avail
3195781/36460868 objects degraded (8.765%)
5079026/36460868 objects misplaced (13.930%)
3640 active+clean
838 active+undersized+degraded+remapped+wait_backfill
184 active+remapped+wait_backfill
134 incomplete
48 active+undersized+degraded+remapped+backfilling
19 down+incomplete
6
active+undersized+degraded+remapped+wait_backfill+backfill_toofull
1 active+remapped+backfill_toofull
1 peering
1 down+peering
recovery io 93909 kB/s, 10 keys/s, 67 objects/s
# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 77.22777 root default
-9 27.14778 rack sala1
-2 5.41974 host loki01
14 0.90329 osd.14 up 1.00000 1.00000
15 0.90329 osd.15 up 1.00000 1.00000
16 0.90329 osd.16 up 1.00000 1.00000
17 0.90329 osd.17 up 1.00000 1.00000
18 0.90329 osd.18 up 1.00000 1.00000
25 0.90329 osd.25 up 1.00000 1.00000
-4 3.61316 host loki03
0 0.90329 osd.0 up 1.00000 1.00000
2 0.90329 osd.2 up 1.00000 1.00000
20 0.90329 osd.20 up 1.00000 1.00000
24 0.90329 osd.24 up 1.00000 1.00000
-3 9.05714 host loki02
1 0.90300 osd.1 up 0.90002 1.00000
31 2.72198 osd.31 up 1.00000 1.00000
29 0.90329 osd.29 up 1.00000 1.00000
30 0.90329 osd.30 up 1.00000 1.00000
33 0.90329 osd.33 up 1.00000 1.00000
32 2.72229 osd.32 up 1.00000 1.00000
-5 9.05774 host loki04
3 0.90329 osd.3 up 1.00000 1.00000
19 0.90329 osd.19 up 1.00000 1.00000
21 0.90329 osd.21 up 1.00000 1.00000
22 0.90329 osd.22 up 1.00000 1.00000
23 2.72229 osd.23 up 1.00000 1.00000
28 2.72229 osd.28 up 1.00000 1.00000
-10 24.61000 rack sala2.2
-6 24.61000 host loki05
5 2.73000 osd.5 up 1.00000 1.00000
6 2.73000 osd.6 up 1.00000 1.00000
9 2.73000 osd.9 up 1.00000 1.00000
10 2.73000 osd.10 up 1.00000 1.00000
11 2.73000 osd.11 up 1.00000 1.00000
12 2.73000 osd.12 up 1.00000 1.00000
13 2.73000 osd.13 up 1.00000 1.00000
4 2.73000 osd.4 up 1.00000 1.00000
8 2.73000 osd.8 up 1.00000 1.00000
7 0.03999 osd.7 up 1.00000 1.00000
-12 25.46999 rack sala2.1
-11 25.46999 host loki06
34 2.73000 osd.34 up 1.00000 1.00000
35 2.73000 osd.35 up 1.00000 1.00000
36 2.73000 osd.36 up 1.00000 1.00000
37 2.73000 osd.37 up 1.00000 1.00000
38 2.73000 osd.38 up 1.00000 1.00000
39 2.73000 osd.39 up 1.00000 1.00000
40 2.73000 osd.40 up 1.00000 1.00000
43 2.73000 osd.43 up 1.00000 1.00000
42 0.90999 osd.42 up 1.00000 1.00000
41 2.71999 osd.41 up 1.00000 1.00000
# ceph pg dump
You can find it in this link:
http://ergodic.ugr.es/pgdumpoutput.txt
What I did:
My cluster is heterogeneous, having old oss nodes with 1TB disks and
new ones with 3TB. I was having problems with balance, some 1TB osd got
nearly full meanwhile there was plenty of space in others. My plan was
changing some disks to another one biggers. I started the process with
no problems, changing one disk. Reweight to 0.0, wait for rebalance, and
removed.
After that, searching for my problem, I read about straw2. Then, I
changed the algorithm editing the crush map and some data movement did.
My setup was not optimal, I had the journal in the xfs filesystem, so I
decided to change it also. First, I did it slowly, disk by disk, but as
rebalance take much time and my group was pushing me to finish quickly,
I did
ceph osd out osd.id
ceph osd crush remove osd.id
ceph auth del osd.id
ceph osd rm id
Then umount the disks, and using ceph-deploy add then again
ceph-deploy disk zap loki01:/dev/sda
ceph-deploy osd create loki01:/dev/sda
For every disk in rack "sala1". First, I finished loki02. Then, I did
this steps en loki04, loki01 and loki03 at the same time.
Thanks,
--
José M. Martín
El 31/01/17 a las 00:43, Shinobu Kinjo escribió:
First off, the followings, please.
* ceph -s
* ceph osd tree
* ceph pg dump
and
* what you actually did with exact commands.
Regards,
On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín <jmartin@xxxxxxxxxxxxxx> wrote:
Dear list,
I'm having some big problems with my setup.
I was trying to increase the global capacity by changing some osds by
bigger ones. I changed them without wait the rebalance process finished,
thinking the replicas were saved in other buckets, but I found a lot of
PGs incomplete, so replicas of a PG were placed in a same bucket. I have
assumed I have lost data because I zapped the disks and used in other tasks.
My question is: what should I do to recover as much data as possible?
I'm using the filesystem and RBD.
Thank you so much,
--
Jose M. Martín
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com