Re: Problem with query and any operation on PGs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 24 May 2017, Łukasz Chrustek wrote:
> Hello,
> 
> >>
> >> > This
> >> 
> >> osd 6 - isn't startable
> 
> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> > from this osd to recover any important writes on that osd.
> 
> >> osd 10, 37, 72 are startable
> 
> > With those started, I'd repeat the original sequence and get a fresh pg
> > query to confirm that it still wants just osd.6.
> 
> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> > ranodm osd (not one of these ones), import the pg into that osd, and start
> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> > point.  repeat with the same basic process with the other pg.
> 
> Here is output from ceph-objectstore-tool - also didn't success:
> 
> https://pastebin.com/7XGAHdKH

Hmm, btrfs:

2017-05-24 23:28:58.547456 7f500948e940 -1 
filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
/var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid 
losing new data

You could try setting --osd-use-stale-snap as suggested.

Is it the same error with the other one?


Looking at the log you sent earlier for 1.165 on osd.67, and the primary 
reports:

2017-05-24 21:37:11.505256 7efdbc1e5700  5 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] enter Started/Primary/Peering/GetLog
2017-05-24 21:37:11.505291 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.37 1.165( v 112598'67281552 (112574'67278547,112598'67281552] lb 1/56500165/rbd_data.674a3ed7dffd473.0000000000000b38/
head (NIBBLEWISE) local-les=112584 n=1 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505299 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.38 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505306 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.48 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505313 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.67 1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/h
ead (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505319 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] calc_acting osd.88 1.165( empty local-les=0 n=0 ec=253 les/c/f 112600/112582/70621 115601/115601/115601)
2017-05-24 21:37:11.505326 7efdbc1e5700 10 osd.67 pg_epoch: 115601 pg[1.165( v 112959'67282586 (112574'67278552,112959'67282586] lb 1/db616165/rbd_data.ed9979641a9d82.000000000001dcee/head (NIBBLEWISE) local-les=112600 n=354 ec=253 les/c/f 112600/112582/70621 115601/115601/115601) [67,88,48] r=0 lpr=115601 pi=112581-115600/111 crt=112959'67282586 lcod 0'0 mlcod 0'0 peering NIBBLEWISE] choose_acting failed

in particular, osd 37 38 48 67 all have incomplete copies of the PG (they 
are mid-backfill) and 68 has nothing.  Some data is lost unless you can 
recovery another OSD with that PG.

The set of OSDs that might have data are: 6,10,33,72,84

If that bears no fruit, then you can force last_backfill to report 
complete on one of those OSDs and it'll think it has all the data even 
though some of it is likely gone.  (We can pick one that is farther 
along... 38 48 and 67 seem to all match.)

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux