Re: Minimize data lost with PG incomplete

"José M. Martín" <jmartin@xxxxxxxxxxxxxx> · Tue, 31 Jan 2017 11:48:55 +0100

Any idea of how could I recover files from the filesystem mount?
Doing a cp, it hungs when find a damaged file/folder. I would be happy
getting no damaged files

Thanks

El 31/01/17 a las 11:19, José M. Martín escribió:
> Thanks.
> I just realized I keep some of the original OSD. If it contains some of
> the incomplete PGs , would be possible to add then into the new disks?
> Maybe following this steps? http://ceph.com/community/incomplete-pgs-oh-my/
>
> El 31/01/17 a las 10:44, Maxime Guyot escribió:
>> Hi José,
>>
>> Too late, but you could have updated the CRUSHmap *before* moving the disks. Something like: “ceph osd crush set osd.0 0.90329 root=default rack=sala2.2  host=loki05” would move the osd.0 to loki05 and would trigger the appropriate PG movements before any physical move. Then the physical move is done as usual: set noout, stop osd, physically move, active osd, unnset noout.
>>
>> It’s a way to trigger the data movement overnight (maybe with a cron) and do the physical move at your own convenience in the morning.
>>
>> Cheers, 
>> Maxime 
>>
>> On 31/01/17 10:35, "ceph-users on behalf of José M. Martín" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of jmartin@xxxxxxxxxxxxxx> wrote:
>>
>>     Already min_size = 1
>>     
>>     Thanks,
>>     Jose M. Martín
>>     
>>     El 31/01/17 a las 09:44, Henrik Korkuc escribió:
>>     > I am not sure about "incomplete" part out of my head, but you can try
>>     > setting min_size to 1 for pools toreactivate some PG, if they are
>>     > down/inactive due to missing replicas.
>>     >
>>     > On 17-01-31 10:24, José M. Martín wrote:
>>     >> # ceph -s
>>     >>      cluster 29a91870-2ed2-40dc-969e-07b22f37928b
>>     >>       health HEALTH_ERR
>>     >>              clock skew detected on mon.loki04
>>     >>              155 pgs are stuck inactive for more than 300 seconds
>>     >>              7 pgs backfill_toofull
>>     >>              1028 pgs backfill_wait
>>     >>              48 pgs backfilling
>>     >>              892 pgs degraded
>>     >>              20 pgs down
>>     >>              153 pgs incomplete
>>     >>              2 pgs peering
>>     >>              155 pgs stuck inactive
>>     >>              1077 pgs stuck unclean
>>     >>              892 pgs undersized
>>     >>              1471 requests are blocked > 32 sec
>>     >>              recovery 3195781/36460868 objects degraded (8.765%)
>>     >>              recovery 5079026/36460868 objects misplaced (13.930%)
>>     >>              mds0: Behind on trimming (175/30)
>>     >>              noscrub,nodeep-scrub flag(s) set
>>     >>              Monitor clock skew detected
>>     >>       monmap e5: 5 mons at
>>     >> {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
>>     >>
>>     >>              election epoch 4028, quorum 0,1,2,3,4
>>     >> loki01,loki02,loki03,loki04,loki05
>>     >>        fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
>>     >>       osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
>>     >>              flags noscrub,nodeep-scrub
>>     >>        pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
>>     >>              45892 GB used, 34024 GB / 79916 GB avail
>>     >>              3195781/36460868 objects degraded (8.765%)
>>     >>              5079026/36460868 objects misplaced (13.930%)
>>     >>                  3640 active+clean
>>     >>                   838 active+undersized+degraded+remapped+wait_backfill
>>     >>                   184 active+remapped+wait_backfill
>>     >>                   134 incomplete
>>     >>                    48 active+undersized+degraded+remapped+backfilling
>>     >>                    19 down+incomplete
>>     >>                     6
>>     >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
>>     >>                     1 active+remapped+backfill_toofull
>>     >>                     1 peering
>>     >>                     1 down+peering
>>     >> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
>>     >>
>>     >>
>>     >>
>>     >> # ceph osd tree
>>     >> ID  WEIGHT   TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>     >>   -1 77.22777 root default
>>     >>   -9 27.14778     rack sala1
>>     >>   -2  5.41974         host loki01
>>     >>   14  0.90329             osd.14       up  1.00000          1.00000
>>     >>   15  0.90329             osd.15       up  1.00000          1.00000
>>     >>   16  0.90329             osd.16       up  1.00000          1.00000
>>     >>   17  0.90329             osd.17       up  1.00000          1.00000
>>     >>   18  0.90329             osd.18       up  1.00000          1.00000
>>     >>   25  0.90329             osd.25       up  1.00000          1.00000
>>     >>   -4  3.61316         host loki03
>>     >>    0  0.90329             osd.0        up  1.00000          1.00000
>>     >>    2  0.90329             osd.2        up  1.00000          1.00000
>>     >>   20  0.90329             osd.20       up  1.00000          1.00000
>>     >>   24  0.90329             osd.24       up  1.00000          1.00000
>>     >>   -3  9.05714         host loki02
>>     >>    1  0.90300             osd.1        up  0.90002          1.00000
>>     >>   31  2.72198             osd.31       up  1.00000          1.00000
>>     >>   29  0.90329             osd.29       up  1.00000          1.00000
>>     >>   30  0.90329             osd.30       up  1.00000          1.00000
>>     >>   33  0.90329             osd.33       up  1.00000          1.00000
>>     >>   32  2.72229             osd.32       up  1.00000          1.00000
>>     >>   -5  9.05774         host loki04
>>     >>    3  0.90329             osd.3        up  1.00000          1.00000
>>     >>   19  0.90329             osd.19       up  1.00000          1.00000
>>     >>   21  0.90329             osd.21       up  1.00000          1.00000
>>     >>   22  0.90329             osd.22       up  1.00000          1.00000
>>     >>   23  2.72229             osd.23       up  1.00000          1.00000
>>     >>   28  2.72229             osd.28       up  1.00000          1.00000
>>     >> -10 24.61000     rack sala2.2
>>     >>   -6 24.61000         host loki05
>>     >>    5  2.73000             osd.5        up  1.00000          1.00000
>>     >>    6  2.73000             osd.6        up  1.00000          1.00000
>>     >>    9  2.73000             osd.9        up  1.00000          1.00000
>>     >>   10  2.73000             osd.10       up  1.00000          1.00000
>>     >>   11  2.73000             osd.11       up  1.00000          1.00000
>>     >>   12  2.73000             osd.12       up  1.00000          1.00000
>>     >>   13  2.73000             osd.13       up  1.00000          1.00000
>>     >>    4  2.73000             osd.4        up  1.00000          1.00000
>>     >>    8  2.73000             osd.8        up  1.00000          1.00000
>>     >>    7  0.03999             osd.7        up  1.00000          1.00000
>>     >> -12 25.46999     rack sala2.1
>>     >> -11 25.46999         host loki06
>>     >>   34  2.73000             osd.34       up  1.00000          1.00000
>>     >>   35  2.73000             osd.35       up  1.00000          1.00000
>>     >>   36  2.73000             osd.36       up  1.00000          1.00000
>>     >>   37  2.73000             osd.37       up  1.00000          1.00000
>>     >>   38  2.73000             osd.38       up  1.00000          1.00000
>>     >>   39  2.73000             osd.39       up  1.00000          1.00000
>>     >>   40  2.73000             osd.40       up  1.00000          1.00000
>>     >>   43  2.73000             osd.43       up  1.00000          1.00000
>>     >>   42  0.90999             osd.42       up  1.00000          1.00000
>>     >>   41  2.71999             osd.41       up  1.00000          1.00000
>>     >>
>>     >>
>>     >> # ceph pg dump
>>     >> You can find it in this link:
>>     >> http://ergodic.ugr.es/pgdumpoutput.txt
>>     >>
>>     >>
>>     >> What I did:
>>     >> My cluster is  heterogeneous, having old oss nodes with 1TB disks and
>>     >> new ones with 3TB. I was having problems with balance, some 1TB osd got
>>     >> nearly full meanwhile there was plenty of space in others. My plan was
>>     >> changing some disks to another one biggers. I started the process with
>>     >> no problems, changing one disk. Reweight to 0.0, wait for rebalance, and
>>     >> removed.
>>     >> After that, searching for my problem, I read about straw2. Then, I
>>     >> changed the algorithm editing the crush map and some data movement did.
>>     >> My setup was not optimal, I had the journal in the xfs filesystem, so I
>>     >> decided to change it also. First, I did it slowly, disk by disk, but as
>>     >> rebalance take much time and my group was pushing me to finish quickly,
>>     >> I did
>>     >> ceph osd out osd.id
>>     >> ceph osd crush remove osd.id
>>     >> ceph auth del osd.id
>>     >> ceph osd rm id
>>     >>
>>     >> Then umount the disks, and using ceph-deploy add then again
>>     >> ceph-deploy disk zap loki01:/dev/sda
>>     >> ceph-deploy osd create loki01:/dev/sda
>>     >>
>>     >> For every disk in rack "sala1". First, I finished loki02. Then, I did
>>     >> this steps en loki04, loki01 and loki03 at the same time.
>>     >>
>>     >> Thanks,
>>     >> -- 
>>     >> José M. Martín
>>     >>
>>     >>
>>     >> El 31/01/17 a las 00:43, Shinobu Kinjo escribió:
>>     >>> First off, the followings, please.
>>     >>>
>>     >>>   * ceph -s
>>     >>>   * ceph osd tree
>>     >>>   * ceph pg dump
>>     >>>
>>     >>> and
>>     >>>
>>     >>>   * what you actually did with exact commands.
>>     >>>
>>     >>> Regards,
>>     >>>
>>     >>> On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín
>>     >>> <jmartin@xxxxxxxxxxxxxx> wrote:
>>     >>>> Dear list,
>>     >>>>
>>     >>>> I'm having some big problems with my setup.
>>     >>>>
>>     >>>> I was trying to increase the global capacity by changing some osds by
>>     >>>> bigger ones. I changed them without wait the rebalance process
>>     >>>> finished,
>>     >>>> thinking the replicas were saved in other buckets, but I found a
>>     >>>> lot of
>>     >>>> PGs incomplete, so replicas of a PG were placed in a same bucket. I
>>     >>>> have
>>     >>>> assumed I have lost data because I zapped the disks and used in
>>     >>>> other tasks.
>>     >>>>
>>     >>>> My question is: what should I do to recover as much data as possible?
>>     >>>> I'm using the filesystem and RBD.
>>     >>>>
>>     >>>> Thank you so much,
>>     >>>>
>>     >>>> -- 
>>     >>>>
>>     >>>> Jose M. Martín
>>     >>>>
>>     >>>>
>>     >>>> _______________________________________________
>>     >>>> ceph-users mailing list
>>     >>>> ceph-users@xxxxxxxxxxxxxx
>>     >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >>
>>     >>
>>     >> _______________________________________________
>>     >> ceph-users mailing list
>>     >> ceph-users@xxxxxxxxxxxxxx
>>     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >
>>     >
>>     > _______________________________________________
>>     > ceph-users mailing list
>>     > ceph-users@xxxxxxxxxxxxxx
>>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     
>>     
>>     _______________________________________________
>>     ceph-users mailing list
>>     ceph-users@xxxxxxxxxxxxxx
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com