Hi,all. I finally was succeded. Maybe somebody will be intresting. A script read the content from a fuse-rbd files (i wonder, what is actual use case of fuse-rbd?) with "dd" and, in a case of timeout (alarmed by a background process), killed entire fuse daemon, remount fuse-rbd and resumed at next block. It is very disapointing do not have a standart method for data recovery. IMHO, with such a bug ceph is not production ready. On Tue, Jun 10, 2014 at 01:54:51PM +0400, Alexey Kurnosov wrote: > On Tue, Jun 10, 2014 at 08:58:30AM +0800, Dong Yuan wrote: > > Sorry, the last sentence should be "You can NOT remove PG, except you > > remove the whole Pool." > > > > On 10 June 2014 08:57, Dong Yuan <yuandong1222@xxxxxxxxx> wrote: > > > InComplete PG means this PG can't get enough metadata (PG logs) to > > > enter Active state. This may be because some unrepairable data damage. > > But at least there should be a method to put a incomplete PG in some more reasonable state > for a futher forensic and retrieving other data from good PGs, in which rbd kernel driver > could return EIO for example, not just hang on I/O wait. Is it possible? > > > > > > > You can remove PG, except you remove the whole Pool. > > > > > > On 9 June 2014 21:10, Alexey Kurnosov <alexey@xxxxxxxxxxxxxxx> wrote: > > >> > > >> Guys, what should i do to erase an incomplete PG? > > >> Use special utils, hexedit, any methods? > > >> > > >> > > >> On Fri, Jun 06, 2014 at 11:18:51PM +0400, Alexey Kurnosov wrote: > > >>> On Fri, Jun 06, 2014 at 08:57:46AM -0700, Sage Weil wrote: > > >>> > On Fri, 6 Jun 2014, Alexey Kurnosov wrote: > > >>> > > Hi all. > > >>> > > > > >>> > > Sorry for a rude offtop, but looks like nobody can help me at ceph-users. > > >>> > > Here is the link to my email: > > >>> > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040383.html > > >>> > > Here some additional data: > > >>> > > http://pastebin.com/Nc4y3S1U > > >>> > > > > >>> > > During read requests i can see in logs: > > >>> > > 2014-06-06 13:28:08.586262 7f335f29f700 10 osd.7 21942 dequeue_op 0x356cb40 prio 127 cost 0 latency 0.000352 osd_op(client.11324.1:436 rb.0.1465.2ae8944a.000000000bb1 [read 0~131072] 4.b940a077 e21942) v4 pg pg[4.77( empty local-les=0 n=0 ec=144 les/c 19162/16786 21941/21941/21941) [7,2] r=0 lpr=21941 pi=8764-21940/115 mlcod 0'0 incomplete] > > >>> > > > > >>> > > > > >>> > > Any help would be appreciated. > > >>> > > > >>> > This looks like a hangup somewhere in teh osd/osd communication that is > > >>> > preventing the peering/probing from happening. Since you're running > > >>> > emperor and we stopped testing and backporting fixes there a while back > > >>> > I'm not sure offhand what bug fix is missing. My suggestion is to upgrade > > >>> > to 0.80.1 firefly as a first step. > > >>> Upgrade has been performed. I do not see any changes. > > >>> > > >>> > > >>> > > > >>> > FWIW simply restartin the OSDs involved in those PGs will probably also > > >>> > get things rolling, but this bug will still be present. > > >>> I restarted it many times. Looks like PG copies all are incomplete. > > >>> > > >>> > > >>> > > > >>> > sage > > >>> > > > >>> > > > >>> > > > > >>> > > > > >>> > > (Somebody hit similar issue here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/007948.html) > > >>> > > > > >>> > > -- > > >>> > > Alexey Kurnosov > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > >>> -- > > >>> Alexey Kurnosov > > >>> > > >> > > >> > > > > > > > > > > > > -- > > > Dong Yuan > > > Email:yuandong1222@xxxxxxxxx > > > > > > > > -- > > Dong Yuan > > Email:yuandong1222@xxxxxxxxx > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
pgp2fDHeS_0GM.pgp
Description: PGP signature