Re: Missing clones

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alright, good luck!
The results would be interesting. :-)


Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:

Hi Eugen,

yes, I also see the rbd_data.-Number changing. This can be caused by me
by deleting snapshots and trying to move over VMs to another pool which
is not affected.

Currently I'm trying to move the Finance VM, which is a very old VM
which got created as one of the first VMs and is still alive (as the
only one of this age). Maybe it's really a problem of "old" VM formats,
like mentioned in the links somebody sent where snapshots had wrong/old
bits that a new Ceph could not understrand anymore.

We'll see... the VM is large and currently copying... if the error gets
also copied, the VM format/age is the cause. If not, ... hm...   :-D

Nevertheless thank you for your help!
Karsten




On 20.02.2018 15:47, Eugen Block wrote:
I'm not quite sure how to interpret this, but there are different
objects referenced. From the first log output you pasted:

2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9
10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head expected
clone 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:64e 1
missing

From the failed PG import the logs mention two different objects:

Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.0000000000001d82:head#
snapset 0=[]:{}
Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:18#

And your last log output has another two different objects:

Write #10:9df3943b:::rbd_data.e57feb238e1f29.000000000003c2e1:head#
snapset 0=[]:{}
Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:19#


So in total we're seeing five different rbd_data objects here:

 - rbd_data.2313975238e1f29
 - rbd_data.f5b8603d1b58ba
 - rbd_data.966489238e1f29
 - rbd_data.e57feb238e1f29
 - rbd_data.4401c7238e1f29

This doesn't make too much sense to me, yet. Which ones are belongig to
your corrupted VM? Do you have a backup of the VM in case the repair fails?


Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:

Nope:

Write #10:9df3943b:::rbd_data.e57feb238e1f29.000000000003c2e1:head#
snapset 0=[]:{}
Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:19#
Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:23#
Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:head#
snapset 612=[23,22,15]:{19=[15],23=[23,22]}
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function
'void SnapMapper::add_oid(const hobject_t&, const
std::set<snapid_t>&,
MapCacher::Transaction<std::__cxx11::basic_string<char>,
ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20
13:56:20.672430
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED
assert(r == -2)
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7fd4478c68f2]
 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> > const&,
MapCacher::Transaction<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
 3: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x5569304ca01b]
 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
[0x5569304caae8]
 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ObjectStore::Sequencer&)+0x1135)
[0x5569304d12f5]
 6: (main()+0x3909) [0x556930432349]
 7: (__libc_start_main()+0xf1) [0x7fd444d252b1]
 8: (_start()+0x2a) [0x5569304ba01a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
*** Caught signal (Aborted) **
 in thread 7fd45147a400 thread_name:ceph-objectstor
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (()+0x913f14) [0x556930ae1f14]
 2: (()+0x110c0) [0x7fd44619e0c0]
 3: (gsignal()+0xcf) [0x7fd444d37fcf]
 4: (abort()+0x16a) [0x7fd444d393fa]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x28e) [0x7fd4478c6a7e]
 6: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> > const&,
MapCacher::Transaction<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
 7: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x5569304ca01b]
 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
[0x5569304caae8]
 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ObjectStore::Sequencer&)+0x1135)
[0x5569304d12f5]
 10: (main()+0x3909) [0x556930432349]
 11: (__libc_start_main()+0xf1) [0x7fd444d252b1]
 12: (_start()+0x2a) [0x5569304ba01a]
Aborted



What I also do not understand: If I take your approach of finding out
what is stored in the PG, I get no match with my PG ID anymore.

If I take the approach of "rbd info" which was posted by Mykola Golub, I
get a match - unfortunately the most important VM on our system which
holds the software for our Finance.

Best
Karsten









On 20.02.2018 09:16, Eugen Block wrote:
And does the re-import of the PG work? From the logs I assumed that the
snapshot(s) prevented a successful import, but now that they are deleted
it could work.


Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:

Hi Eugen,

hmmm, that should be :

rbd -p cpVirtualMachines list | while read LINE; do osdmaptool
--test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls
cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while
read LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2
--pool 10 osdmap 2>&1; done; done | less

It's a Proxmox system. There were only two snapshots on the PG, which I
deleted now. Now nothing gets displayed on the PG... is that
possible? A
repair still fails unfortunately...

Best & thank you for the hint!
Karsten



On 19.02.2018 22:42, Eugen Block wrote:
BTW - how can I find out, which RBDs are affected by this problem.
Maybe
a copy/remove of the affected RBDs could help? But how to find out to
which RBDs this PG belongs to?

Depending on how many PGs your cluster/pool has, you could dump your
osdmap and then run the osdmaptool [1] for every rbd object in your
pool
and grep for the affected PG. That would be quick for a few objects, I
guess:

ceph1:~ # ceph osd getmap -o /tmp/osdmap

ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap
osdmaptool: osdmap file '/tmp/osdmap'
 object 'image1' -> 5.2 -> [0]

ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap
osdmaptool: osdmap file '/tmp/osdmap'
 object 'image2' -> 5.f -> [0]


[1]
https://www.hastexo.com/resources/hints-and-kinks/which-osd-stores-specific-rados-object/




Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:

BTW - how can I find out, which RBDs are affected by this problem.
Maybe
a copy/remove of the affected RBDs could help? But how to find out to
which RBDs this PG belongs to?

Best
Karsten

On 19.02.2018 19:26, Karsten Becker wrote:
Hi.

Thank you for the tip. I just tried... but unfortunately the import
aborts:

Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.0000000000001d82:head#
snapset 0=[]:{}
Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:18#
Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:24#
Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:head#
snapset 628=[24,21,17]:{18=[17],24=[24,21]}
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function
'void SnapMapper::add_oid(const hobject_t&, const
std::set<snapid_t>&,
MapCacher::Transaction<std::__cxx11::basic_string<char>,
ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19
19:24:18.917515
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED
assert(r == -2)
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7facb0c2a8f2]
 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> > const&,
MapCacher::Transaction<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9]
 3: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x55eef35f901b]
 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*,
ObjectStore::Sequencer&)+0x738)
[0x55eef35f9ae8]
 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ObjectStore::Sequencer&)+0x1135)
[0x55eef36002f5]
 6: (main()+0x3909) [0x55eef3561349]
 7: (__libc_start_main()+0xf1) [0x7facae0892b1]
 8: (_start()+0x2a) [0x55eef35e901a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
*** Caught signal (Aborted) **
 in thread 7facba7de400 thread_name:ceph-objectstor
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (()+0x913f14) [0x55eef3c10f14]
 2: (()+0x110c0) [0x7facaf5020c0]
 3: (gsignal()+0xcf) [0x7facae09bfcf]
 4: (abort()+0x16a) [0x7facae09d3fa]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x28e) [0x7facb0c2aa7e]
 6: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> > const&,
MapCacher::Transaction<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9]
 7: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x55eef35f901b]
 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*,
ObjectStore::Sequencer&)+0x738)
[0x55eef35f9ae8]
 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, ObjectStore::Sequencer&)+0x1135)
[0x55eef36002f5]
 10: (main()+0x3909) [0x55eef3561349]
 11: (__libc_start_main()+0xf1) [0x7facae0892b1]
 12: (_start()+0x2a) [0x55eef35e901a]
Aborted

Best
Karsten

On 19.02.2018 17:09, Eugen Block wrote:
Could [1] be of interest?
Exporting the intact PG and importing it back to the rescpective
OSD
sounds promising.

[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html





Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:

Hi.

We have size=3 min_size=2.

But this "upgrade" has been done during the weekend. We had size=2
min_size=1 before.

Best
Karsten



On 19.02.2018 13:02, Eugen Block wrote:
Hi,

just to rule out the obvious, which size does the pool have? You
aren't
running it with size = 2, do you?


Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:

Hi,

I have one damaged PG in my cluster. All OSDs are BlueStore. How
do I
fix this?

2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9
10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head
expected
clone
10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:64e 1
missing
2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9
10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head 1
missing clone(s)
2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update:
Possible
data damage: 1 pg inconsistent (PG_DAMAGED)
2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1
errors, 0
fixed
2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1
scrub
errors; Possible data damage: 1 pg inconsistent

"ceph pg repair 10.7b9" fails and is not able to fix ist. A
manually
started scrub "ceph pg scrub 10.7b9" also.

Best from Berlin/Germany
Karsten


Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin
(Charlottenburg), HRB 57947
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin
(Charlottenburg), HRB 57947
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin
(Charlottenburg), HRB 57947
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin
(Charlottenburg), HRB 57947
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin
(Charlottenburg), HRB 57947
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin
(Charlottenburg), HRB 57947
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux