# ceph -s cluster 29a91870-2ed2-40dc-969e-07b22f37928b health HEALTH_ERR clock skew detected on mon.loki04 155 pgs are stuck inactive for more than 300 seconds 7 pgs backfill_toofull 1028 pgs backfill_wait 48 pgs backfilling 892 pgs degraded 20 pgs down 153 pgs incomplete 2 pgs peering 155 pgs stuck inactive 1077 pgs stuck unclean 892 pgs undersized 1471 requests are blocked > 32 sec recovery 3195781/36460868 objects degraded (8.765%) recovery 5079026/36460868 objects misplaced (13.930%) mds0: Behind on trimming (175/30) noscrub,nodeep-scrub flag(s) set Monitor clock skew detected monmap e5: 5 mons at {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0} election epoch 4028, quorum 0,1,2,3,4 loki01,loki02,loki03,loki04,loki05 fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs flags noscrub,nodeep-scrub pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects 45892 GB used, 34024 GB / 79916 GB avail 3195781/36460868 objects degraded (8.765%) 5079026/36460868 objects misplaced (13.930%) 3640 active+clean 838 active+undersized+degraded+remapped+wait_backfill 184 active+remapped+wait_backfill 134 incomplete 48 active+undersized+degraded+remapped+backfilling 19 down+incomplete 6 active+undersized+degraded+remapped+wait_backfill+backfill_toofull 1 active+remapped+backfill_toofull 1 peering 1 down+peering recovery io 93909 kB/s, 10 keys/s, 67 objects/s # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 77.22777 root default -9 27.14778 rack sala1 -2 5.41974 host loki01 14 0.90329 osd.14 up 1.00000 1.00000 15 0.90329 osd.15 up 1.00000 1.00000 16 0.90329 osd.16 up 1.00000 1.00000 17 0.90329 osd.17 up 1.00000 1.00000 18 0.90329 osd.18 up 1.00000 1.00000 25 0.90329 osd.25 up 1.00000 1.00000 -4 3.61316 host loki03 0 0.90329 osd.0 up 1.00000 1.00000 2 0.90329 osd.2 up 1.00000 1.00000 20 0.90329 osd.20 up 1.00000 1.00000 24 0.90329 osd.24 up 1.00000 1.00000 -3 9.05714 host loki02 1 0.90300 osd.1 up 0.90002 1.00000 31 2.72198 osd.31 up 1.00000 1.00000 29 0.90329 osd.29 up 1.00000 1.00000 30 0.90329 osd.30 up 1.00000 1.00000 33 0.90329 osd.33 up 1.00000 1.00000 32 2.72229 osd.32 up 1.00000 1.00000 -5 9.05774 host loki04 3 0.90329 osd.3 up 1.00000 1.00000 19 0.90329 osd.19 up 1.00000 1.00000 21 0.90329 osd.21 up 1.00000 1.00000 22 0.90329 osd.22 up 1.00000 1.00000 23 2.72229 osd.23 up 1.00000 1.00000 28 2.72229 osd.28 up 1.00000 1.00000 -10 24.61000 rack sala2.2 -6 24.61000 host loki05 5 2.73000 osd.5 up 1.00000 1.00000 6 2.73000 osd.6 up 1.00000 1.00000 9 2.73000 osd.9 up 1.00000 1.00000 10 2.73000 osd.10 up 1.00000 1.00000 11 2.73000 osd.11 up 1.00000 1.00000 12 2.73000 osd.12 up 1.00000 1.00000 13 2.73000 osd.13 up 1.00000 1.00000 4 2.73000 osd.4 up 1.00000 1.00000 8 2.73000 osd.8 up 1.00000 1.00000 7 0.03999 osd.7 up 1.00000 1.00000 -12 25.46999 rack sala2.1 -11 25.46999 host loki06 34 2.73000 osd.34 up 1.00000 1.00000 35 2.73000 osd.35 up 1.00000 1.00000 36 2.73000 osd.36 up 1.00000 1.00000 37 2.73000 osd.37 up 1.00000 1.00000 38 2.73000 osd.38 up 1.00000 1.00000 39 2.73000 osd.39 up 1.00000 1.00000 40 2.73000 osd.40 up 1.00000 1.00000 43 2.73000 osd.43 up 1.00000 1.00000 42 0.90999 osd.42 up 1.00000 1.00000 41 2.71999 osd.41 up 1.00000 1.00000 # ceph pg dump You can find it in this link: http://ergodic.ugr.es/pgdumpoutput.txt What I did: My cluster is heterogeneous, having old oss nodes with 1TB disks and new ones with 3TB. I was having problems with balance, some 1TB osd got nearly full meanwhile there was plenty of space in others. My plan was changing some disks to another one biggers. I started the process with no problems, changing one disk. Reweight to 0.0, wait for rebalance, and removed. After that, searching for my problem, I read about straw2. Then, I changed the algorithm editing the crush map and some data movement did. My setup was not optimal, I had the journal in the xfs filesystem, so I decided to change it also. First, I did it slowly, disk by disk, but as rebalance take much time and my group was pushing me to finish quickly, I did ceph osd out osd.id ceph osd crush remove osd.id ceph auth del osd.id ceph osd rm id Then umount the disks, and using ceph-deploy add then again ceph-deploy disk zap loki01:/dev/sda ceph-deploy osd create loki01:/dev/sda For every disk in rack "sala1". First, I finished loki02. Then, I did this steps en loki04, loki01 and loki03 at the same time. Thanks, -- José M. Martín El 31/01/17 a las 00:43, Shinobu Kinjo escribió: > First off, the followings, please. > > * ceph -s > * ceph osd tree > * ceph pg dump > > and > > * what you actually did with exact commands. > > Regards, > > On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín <jmartin@xxxxxxxxxxxxxx> wrote: >> Dear list, >> >> I'm having some big problems with my setup. >> >> I was trying to increase the global capacity by changing some osds by >> bigger ones. I changed them without wait the rebalance process finished, >> thinking the replicas were saved in other buckets, but I found a lot of >> PGs incomplete, so replicas of a PG were placed in a same bucket. I have >> assumed I have lost data because I zapped the disks and used in other tasks. >> >> My question is: what should I do to recover as much data as possible? >> I'm using the filesystem and RBD. >> >> Thank you so much, >> >> -- >> >> Jose M. Martín >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com