Hi José, Too late, but you could have updated the CRUSHmap *before* moving the disks. Something like: “ceph osd crush set osd.0 0.90329 root=default rack=sala2.2 host=loki05” would move the osd.0 to loki05 and would trigger the appropriate PG movements before any physical move. Then the physical move is done as usual: set noout, stop osd, physically move, active osd, unnset noout. It’s a way to trigger the data movement overnight (maybe with a cron) and do the physical move at your own convenience in the morning. Cheers, Maxime On 31/01/17 10:35, "ceph-users on behalf of José M. Martín" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of jmartin@xxxxxxxxxxxxxx> wrote: Already min_size = 1 Thanks, Jose M. Martín El 31/01/17 a las 09:44, Henrik Korkuc escribió: > I am not sure about "incomplete" part out of my head, but you can try > setting min_size to 1 for pools toreactivate some PG, if they are > down/inactive due to missing replicas. > > On 17-01-31 10:24, José M. Martín wrote: >> # ceph -s >> cluster 29a91870-2ed2-40dc-969e-07b22f37928b >> health HEALTH_ERR >> clock skew detected on mon.loki04 >> 155 pgs are stuck inactive for more than 300 seconds >> 7 pgs backfill_toofull >> 1028 pgs backfill_wait >> 48 pgs backfilling >> 892 pgs degraded >> 20 pgs down >> 153 pgs incomplete >> 2 pgs peering >> 155 pgs stuck inactive >> 1077 pgs stuck unclean >> 892 pgs undersized >> 1471 requests are blocked > 32 sec >> recovery 3195781/36460868 objects degraded (8.765%) >> recovery 5079026/36460868 objects misplaced (13.930%) >> mds0: Behind on trimming (175/30) >> noscrub,nodeep-scrub flag(s) set >> Monitor clock skew detected >> monmap e5: 5 mons at >> {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0} >> >> election epoch 4028, quorum 0,1,2,3,4 >> loki01,loki02,loki03,loki04,loki05 >> fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby >> osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs >> flags noscrub,nodeep-scrub >> pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects >> 45892 GB used, 34024 GB / 79916 GB avail >> 3195781/36460868 objects degraded (8.765%) >> 5079026/36460868 objects misplaced (13.930%) >> 3640 active+clean >> 838 active+undersized+degraded+remapped+wait_backfill >> 184 active+remapped+wait_backfill >> 134 incomplete >> 48 active+undersized+degraded+remapped+backfilling >> 19 down+incomplete >> 6 >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull >> 1 active+remapped+backfill_toofull >> 1 peering >> 1 down+peering >> recovery io 93909 kB/s, 10 keys/s, 67 objects/s >> >> >> >> # ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 77.22777 root default >> -9 27.14778 rack sala1 >> -2 5.41974 host loki01 >> 14 0.90329 osd.14 up 1.00000 1.00000 >> 15 0.90329 osd.15 up 1.00000 1.00000 >> 16 0.90329 osd.16 up 1.00000 1.00000 >> 17 0.90329 osd.17 up 1.00000 1.00000 >> 18 0.90329 osd.18 up 1.00000 1.00000 >> 25 0.90329 osd.25 up 1.00000 1.00000 >> -4 3.61316 host loki03 >> 0 0.90329 osd.0 up 1.00000 1.00000 >> 2 0.90329 osd.2 up 1.00000 1.00000 >> 20 0.90329 osd.20 up 1.00000 1.00000 >> 24 0.90329 osd.24 up 1.00000 1.00000 >> -3 9.05714 host loki02 >> 1 0.90300 osd.1 up 0.90002 1.00000 >> 31 2.72198 osd.31 up 1.00000 1.00000 >> 29 0.90329 osd.29 up 1.00000 1.00000 >> 30 0.90329 osd.30 up 1.00000 1.00000 >> 33 0.90329 osd.33 up 1.00000 1.00000 >> 32 2.72229 osd.32 up 1.00000 1.00000 >> -5 9.05774 host loki04 >> 3 0.90329 osd.3 up 1.00000 1.00000 >> 19 0.90329 osd.19 up 1.00000 1.00000 >> 21 0.90329 osd.21 up 1.00000 1.00000 >> 22 0.90329 osd.22 up 1.00000 1.00000 >> 23 2.72229 osd.23 up 1.00000 1.00000 >> 28 2.72229 osd.28 up 1.00000 1.00000 >> -10 24.61000 rack sala2.2 >> -6 24.61000 host loki05 >> 5 2.73000 osd.5 up 1.00000 1.00000 >> 6 2.73000 osd.6 up 1.00000 1.00000 >> 9 2.73000 osd.9 up 1.00000 1.00000 >> 10 2.73000 osd.10 up 1.00000 1.00000 >> 11 2.73000 osd.11 up 1.00000 1.00000 >> 12 2.73000 osd.12 up 1.00000 1.00000 >> 13 2.73000 osd.13 up 1.00000 1.00000 >> 4 2.73000 osd.4 up 1.00000 1.00000 >> 8 2.73000 osd.8 up 1.00000 1.00000 >> 7 0.03999 osd.7 up 1.00000 1.00000 >> -12 25.46999 rack sala2.1 >> -11 25.46999 host loki06 >> 34 2.73000 osd.34 up 1.00000 1.00000 >> 35 2.73000 osd.35 up 1.00000 1.00000 >> 36 2.73000 osd.36 up 1.00000 1.00000 >> 37 2.73000 osd.37 up 1.00000 1.00000 >> 38 2.73000 osd.38 up 1.00000 1.00000 >> 39 2.73000 osd.39 up 1.00000 1.00000 >> 40 2.73000 osd.40 up 1.00000 1.00000 >> 43 2.73000 osd.43 up 1.00000 1.00000 >> 42 0.90999 osd.42 up 1.00000 1.00000 >> 41 2.71999 osd.41 up 1.00000 1.00000 >> >> >> # ceph pg dump >> You can find it in this link: >> http://ergodic.ugr.es/pgdumpoutput.txt >> >> >> What I did: >> My cluster is heterogeneous, having old oss nodes with 1TB disks and >> new ones with 3TB. I was having problems with balance, some 1TB osd got >> nearly full meanwhile there was plenty of space in others. My plan was >> changing some disks to another one biggers. I started the process with >> no problems, changing one disk. Reweight to 0.0, wait for rebalance, and >> removed. >> After that, searching for my problem, I read about straw2. Then, I >> changed the algorithm editing the crush map and some data movement did. >> My setup was not optimal, I had the journal in the xfs filesystem, so I >> decided to change it also. First, I did it slowly, disk by disk, but as >> rebalance take much time and my group was pushing me to finish quickly, >> I did >> ceph osd out osd.id >> ceph osd crush remove osd.id >> ceph auth del osd.id >> ceph osd rm id >> >> Then umount the disks, and using ceph-deploy add then again >> ceph-deploy disk zap loki01:/dev/sda >> ceph-deploy osd create loki01:/dev/sda >> >> For every disk in rack "sala1". First, I finished loki02. Then, I did >> this steps en loki04, loki01 and loki03 at the same time. >> >> Thanks, >> -- >> José M. Martín >> >> >> El 31/01/17 a las 00:43, Shinobu Kinjo escribió: >>> First off, the followings, please. >>> >>> * ceph -s >>> * ceph osd tree >>> * ceph pg dump >>> >>> and >>> >>> * what you actually did with exact commands. >>> >>> Regards, >>> >>> On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín >>> <jmartin@xxxxxxxxxxxxxx> wrote: >>>> Dear list, >>>> >>>> I'm having some big problems with my setup. >>>> >>>> I was trying to increase the global capacity by changing some osds by >>>> bigger ones. I changed them without wait the rebalance process >>>> finished, >>>> thinking the replicas were saved in other buckets, but I found a >>>> lot of >>>> PGs incomplete, so replicas of a PG were placed in a same bucket. I >>>> have >>>> assumed I have lost data because I zapped the disks and used in >>>> other tasks. >>>> >>>> My question is: what should I do to recover as much data as possible? >>>> I'm using the filesystem and RBD. >>>> >>>> Thank you so much, >>>> >>>> -- >>>> >>>> Jose M. Martín >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com