Re: Another cluster completely hang

Tomasz Kuzemko <tomasz.kuzemko@xxxxxxxxxxxx> · Wed, 29 Jun 2016 09:36:12 +0200

Hi,
if you need fast access to your remaining data you can use
ceph-objectstore-tool to mark those PGs as complete, however this will
irreversibly lose the missing data.

If you understand the risks, this procedure is pretty good explained here:
http://ceph.com/community/incomplete-pgs-oh-my/

Since this article was written, ceph-objectstore-tool gained a feature
that was not available at that time, that is "--op mark-complete". I
think it will be necessary in your case to call --op mark-complete after
you import the PG to temporary OSD (between steps 12 and 13).

On 29.06.2016 09:09, Mario Giammarco wrote:
> Now I have also discovered that, by mistake, someone has put production
> data on a virtual machine of the cluster. I need that ceph starts I/O so
> I can boot that virtual machine.
> Can I mark the incomplete pgs as valid?
> If needed, where can I buy some paid support?
> Thanks again,
> Mario
> 
> Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco
> <mgiammarco@xxxxxxxxx <mailto:mgiammarco@xxxxxxxxx>> ha scritto:
> 
>     pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
>     object_hash rjenkins pg_num 512 pgp_num 512 last_change 9313 flags
>     hashpspool stripe_width 0
>            removed_snaps [1~3]
>     pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0
>     object_hash rjenkins pg_num 512 pgp_num 512 last_change 9314 flags
>     hashpspool stripe_width 0
>            removed_snaps [1~3]
>     pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0
>     object_hash rjenkins pg_num 512 pgp_num 512 last_change 10537 flags
>     hashpspool stripe_width 0
>            removed_snaps [1~3]
> 
> 
>     ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR   
>     5 1.81000  1.00000  1857G  984G  872G 53.00 0.86  
>     6 1.81000  1.00000  1857G 1202G  655G 64.73 1.05  
>     2 1.81000  1.00000  1857G 1158G  698G 62.38 1.01  
>     3 1.35999  1.00000  1391G  906G  485G 65.12 1.06  
>     4 0.89999  1.00000   926G  702G  223G 75.88 1.23  
>     7 1.81000  1.00000  1857G 1063G  793G 57.27 0.93  
>     8 1.81000  1.00000  1857G 1011G  846G 54.44 0.88  
>     9 0.89999  1.00000   926G  573G  352G 61.91 1.01  
>     0 1.81000  1.00000  1857G 1227G  629G 66.10 1.07  
>     13 0.45000  1.00000   460G  307G  153G 66.74 1.08  
>                  TOTAL 14846G 9136G 5710G 61.54       
>     MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
> 
> 
> 
>     ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> 
>     http://pastebin.com/SvGfcSHb
>     http://pastebin.com/gYFatsNS
>     http://pastebin.com/VZD7j2vN
> 
>     I do not understand why I/O on ENTIRE cluster is blocked when only
>     few pgs are incomplete.
> 
>     Many thanks,
>     Mario
> 
> 
>     Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost
>     AG <s.priebe@xxxxxxxxxxxx <mailto:s.priebe@xxxxxxxxxxxx>> ha scritto:
> 
>         And ceph health detail
> 
>         Stefan
> 
>         Excuse my typo sent from my mobile phone.
> 
>         Am 28.06.2016 um 19:28 schrieb Oliver Dzombic
>         <info@xxxxxxxxxxxxxxxxx <mailto:info@xxxxxxxxxxxxxxxxx>>:
> 
>>         Hi Mario,
>>
>>         please give some more details:
>>
>>         Please the output of:
>>
>>         ceph osd pool ls detail
>>         ceph osd df
>>         ceph --version
>>
>>         ceph -w for 10 seconds ( use http://pastebin.com/ please )
>>
>>         ceph osd crush dump ( also pastebin pls )
>>
>>         -- 
>>         Mit freundlichen Gruessen / Best regards
>>
>>         Oliver Dzombic
>>         IP-Interactive
>>
>>         mailto:info@xxxxxxxxxxxxxxxxx
>>
>>         Anschrift:
>>
>>         IP Interactive UG ( haftungsbeschraenkt )
>>         Zum Sonnenberg 1-3
>>         63571 Gelnhausen
>>
>>         HRB 93402 beim Amtsgericht Hanau
>>         Geschäftsführung: Oliver Dzombic
>>
>>         Steuer Nr.: 35 236 3622 1
>>         UST ID: DE274086107
>>
>>
>>         Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>>>         Hello,
>>>         this is the second time that happens to me, I hope that
>>>         someone can
>>>         explain what I can do.
>>>         Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
>>>
>>>         One hdd goes down due to bad sectors.
>>>         Ceph recovers but it ends with:
>>>
>>>         cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>>>             health HEALTH_WARN
>>>                    3 pgs down
>>>                    19 pgs incomplete
>>>                    19 pgs stuck inactive
>>>                    19 pgs stuck unclean
>>>                    7 requests are blocked > 32 sec
>>>             monmap e11: 7 mons at
>>>         {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0
>>>         <http://192.168.0.204:6789/0,1=192.168.0.201:6789/0>,
>>>         2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202
>>>         <http://192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202>:
>>>         6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0
>>>         <http://192.168.0.206:6789/0,6=192.168.0.207:6789/0>}
>>>                    election epoch 722, quorum
>>>         0,1,2,3,4,5,6 1,4,2,0,3,5,6
>>>             osdmap e10182: 10 osds: 10 up, 10 in
>>>              pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143
>>>         kobjects
>>>                    9136 GB used, 5710 GB / 14846 GB avail
>>>                        1005 active+clean
>>>                          16 incomplete
>>>                           3 down+incomplete
>>>
>>>         Unfortunately "7 requests blocked" means no virtual machine
>>>         can boot
>>>         because ceph has stopped i/o.
>>>
>>>         I can accept to lose some data, but not ALL data!
>>>         Can you help me please?
>>>         Thanks,
>>>         Mario
>>>
>>>         _______________________________________________
>>>         ceph-users mailing list
>>>         ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>         _______________________________________________
>>         ceph-users mailing list
>>         ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>         _______________________________________________
>         ceph-users mailing list
>         ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Tomasz Kuzemko
tomasz.kuzemko@xxxxxxxxxxxx

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com