Re: Minimize data lost with PG incomplete

"José M. Martín" <jmartin@xxxxxxxxxxxxxx> · Tue, 31 Jan 2017 10:35:27 +0100

Already min_size = 1

Thanks,
Jose M. Martín

El 31/01/17 a las 09:44, Henrik Korkuc escribió:
> I am not sure about "incomplete" part out of my head, but you can try
> setting min_size to 1 for pools toreactivate some PG, if they are
> down/inactive due to missing replicas.
>
> On 17-01-31 10:24, José M. Martín wrote:
>> # ceph -s
>>      cluster 29a91870-2ed2-40dc-969e-07b22f37928b
>>       health HEALTH_ERR
>>              clock skew detected on mon.loki04
>>              155 pgs are stuck inactive for more than 300 seconds
>>              7 pgs backfill_toofull
>>              1028 pgs backfill_wait
>>              48 pgs backfilling
>>              892 pgs degraded
>>              20 pgs down
>>              153 pgs incomplete
>>              2 pgs peering
>>              155 pgs stuck inactive
>>              1077 pgs stuck unclean
>>              892 pgs undersized
>>              1471 requests are blocked > 32 sec
>>              recovery 3195781/36460868 objects degraded (8.765%)
>>              recovery 5079026/36460868 objects misplaced (13.930%)
>>              mds0: Behind on trimming (175/30)
>>              noscrub,nodeep-scrub flag(s) set
>>              Monitor clock skew detected
>>       monmap e5: 5 mons at
>> {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
>>
>>              election epoch 4028, quorum 0,1,2,3,4
>> loki01,loki02,loki03,loki04,loki05
>>        fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
>>       osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
>>              flags noscrub,nodeep-scrub
>>        pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
>>              45892 GB used, 34024 GB / 79916 GB avail
>>              3195781/36460868 objects degraded (8.765%)
>>              5079026/36460868 objects misplaced (13.930%)
>>                  3640 active+clean
>>                   838 active+undersized+degraded+remapped+wait_backfill
>>                   184 active+remapped+wait_backfill
>>                   134 incomplete
>>                    48 active+undersized+degraded+remapped+backfilling
>>                    19 down+incomplete
>>                     6
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
>>                     1 active+remapped+backfill_toofull
>>                     1 peering
>>                     1 down+peering
>> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
>>
>>
>>
>> # ceph osd tree
>> ID  WEIGHT   TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>   -1 77.22777 root default
>>   -9 27.14778     rack sala1
>>   -2  5.41974         host loki01
>>   14  0.90329             osd.14       up  1.00000          1.00000
>>   15  0.90329             osd.15       up  1.00000          1.00000
>>   16  0.90329             osd.16       up  1.00000          1.00000
>>   17  0.90329             osd.17       up  1.00000          1.00000
>>   18  0.90329             osd.18       up  1.00000          1.00000
>>   25  0.90329             osd.25       up  1.00000          1.00000
>>   -4  3.61316         host loki03
>>    0  0.90329             osd.0        up  1.00000          1.00000
>>    2  0.90329             osd.2        up  1.00000          1.00000
>>   20  0.90329             osd.20       up  1.00000          1.00000
>>   24  0.90329             osd.24       up  1.00000          1.00000
>>   -3  9.05714         host loki02
>>    1  0.90300             osd.1        up  0.90002          1.00000
>>   31  2.72198             osd.31       up  1.00000          1.00000
>>   29  0.90329             osd.29       up  1.00000          1.00000
>>   30  0.90329             osd.30       up  1.00000          1.00000
>>   33  0.90329             osd.33       up  1.00000          1.00000
>>   32  2.72229             osd.32       up  1.00000          1.00000
>>   -5  9.05774         host loki04
>>    3  0.90329             osd.3        up  1.00000          1.00000
>>   19  0.90329             osd.19       up  1.00000          1.00000
>>   21  0.90329             osd.21       up  1.00000          1.00000
>>   22  0.90329             osd.22       up  1.00000          1.00000
>>   23  2.72229             osd.23       up  1.00000          1.00000
>>   28  2.72229             osd.28       up  1.00000          1.00000
>> -10 24.61000     rack sala2.2
>>   -6 24.61000         host loki05
>>    5  2.73000             osd.5        up  1.00000          1.00000
>>    6  2.73000             osd.6        up  1.00000          1.00000
>>    9  2.73000             osd.9        up  1.00000          1.00000
>>   10  2.73000             osd.10       up  1.00000          1.00000
>>   11  2.73000             osd.11       up  1.00000          1.00000
>>   12  2.73000             osd.12       up  1.00000          1.00000
>>   13  2.73000             osd.13       up  1.00000          1.00000
>>    4  2.73000             osd.4        up  1.00000          1.00000
>>    8  2.73000             osd.8        up  1.00000          1.00000
>>    7  0.03999             osd.7        up  1.00000          1.00000
>> -12 25.46999     rack sala2.1
>> -11 25.46999         host loki06
>>   34  2.73000             osd.34       up  1.00000          1.00000
>>   35  2.73000             osd.35       up  1.00000          1.00000
>>   36  2.73000             osd.36       up  1.00000          1.00000
>>   37  2.73000             osd.37       up  1.00000          1.00000
>>   38  2.73000             osd.38       up  1.00000          1.00000
>>   39  2.73000             osd.39       up  1.00000          1.00000
>>   40  2.73000             osd.40       up  1.00000          1.00000
>>   43  2.73000             osd.43       up  1.00000          1.00000
>>   42  0.90999             osd.42       up  1.00000          1.00000
>>   41  2.71999             osd.41       up  1.00000          1.00000
>>
>>
>> # ceph pg dump
>> You can find it in this link:
>> http://ergodic.ugr.es/pgdumpoutput.txt
>>
>>
>> What I did:
>> My cluster is  heterogeneous, having old oss nodes with 1TB disks and
>> new ones with 3TB. I was having problems with balance, some 1TB osd got
>> nearly full meanwhile there was plenty of space in others. My plan was
>> changing some disks to another one biggers. I started the process with
>> no problems, changing one disk. Reweight to 0.0, wait for rebalance, and
>> removed.
>> After that, searching for my problem, I read about straw2. Then, I
>> changed the algorithm editing the crush map and some data movement did.
>> My setup was not optimal, I had the journal in the xfs filesystem, so I
>> decided to change it also. First, I did it slowly, disk by disk, but as
>> rebalance take much time and my group was pushing me to finish quickly,
>> I did
>> ceph osd out osd.id
>> ceph osd crush remove osd.id
>> ceph auth del osd.id
>> ceph osd rm id
>>
>> Then umount the disks, and using ceph-deploy add then again
>> ceph-deploy disk zap loki01:/dev/sda
>> ceph-deploy osd create loki01:/dev/sda
>>
>> For every disk in rack "sala1". First, I finished loki02. Then, I did
>> this steps en loki04, loki01 and loki03 at the same time.
>>
>> Thanks,
>> -- 
>> José M. Martín
>>
>>
>> El 31/01/17 a las 00:43, Shinobu Kinjo escribió:
>>> First off, the followings, please.
>>>
>>>   * ceph -s
>>>   * ceph osd tree
>>>   * ceph pg dump
>>>
>>> and
>>>
>>>   * what you actually did with exact commands.
>>>
>>> Regards,
>>>
>>> On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín
>>> <jmartin@xxxxxxxxxxxxxx> wrote:
>>>> Dear list,
>>>>
>>>> I'm having some big problems with my setup.
>>>>
>>>> I was trying to increase the global capacity by changing some osds by
>>>> bigger ones. I changed them without wait the rebalance process
>>>> finished,
>>>> thinking the replicas were saved in other buckets, but I found a
>>>> lot of
>>>> PGs incomplete, so replicas of a PG were placed in a same bucket. I
>>>> have
>>>> assumed I have lost data because I zapped the disks and used in
>>>> other tasks.
>>>>
>>>> My question is: what should I do to recover as much data as possible?
>>>> I'm using the filesystem and RBD.
>>>>
>>>> Thank you so much,
>>>>
>>>> -- 
>>>>
>>>> Jose M. Martín
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com