2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:64e 1 missing
From the failed PG import the logs mention two different objects:
Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.0000000000001d82:head#
snapset 0=[]:{} Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:18# And your last log output has another two different objects:
Write #10:9df3943b:::rbd_data.e57feb238e1f29.000000000003c2e1:head#
snapset 0=[]:{} Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:19# So in total we're seeing five different rbd_data objects here: - rbd_data.2313975238e1f29 - rbd_data.f5b8603d1b58ba - rbd_data.966489238e1f29 - rbd_data.e57feb238e1f29 - rbd_data.4401c7238e1f29This doesn't make too much sense to me, yet. Which ones are belongig to your corrupted VM? Do you have a backup of the VM in case the repair fails?
Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:
Nope:Write #10:9df3943b:::rbd_data.e57feb238e1f29.000000000003c2e1:head# snapset 0=[]:{} Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:19# Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:23# Write #10:9df399dd:::rbd_data.4401c7238e1f29.000000000000050d:head# snapset 612=[23,22,15]:{19=[15],23=[23,22]}/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, MapCacher::Transaction<std::__cxx11::basic_string<char>, ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20 13:56:20.672430 /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fd4478c68f2] 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list>*)+0x8e9) [0x556930765fe9] 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5]6: (main()+0x3909) [0x556930432349] 7: (__libc_start_main()+0xf1) [0x7fd444d252b1] 8: (_start()+0x2a) [0x5569304ba01a]NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.*** Caught signal (Aborted) ** in thread 7fd45147a400 thread_name:ceph-objectstorceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)1: (()+0x913f14) [0x556930ae1f14] 2: (()+0x110c0) [0x7fd44619e0c0] 3: (gsignal()+0xcf) [0x7fd444d37fcf] 4: (abort()+0x16a) [0x7fd444d393fa]5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7fd4478c6a7e] 6: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list>*)+0x8e9) [0x556930765fe9] 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b] 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8] 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5]10: (main()+0x3909) [0x556930432349] 11: (__libc_start_main()+0xf1) [0x7fd444d252b1] 12: (_start()+0x2a) [0x5569304ba01a] AbortedWhat I also do not understand: If I take your approach of finding out what is stored in the PG, I get no match with my PG ID anymore. If I take the approach of "rbd info" which was posted by Mykola Golub, I get a match - unfortunately the most important VM on our system which holds the software for our Finance. Best Karsten On 20.02.2018 09:16, Eugen Block wrote:And does the re-import of the PG work? From the logs I assumed that the snapshot(s) prevented a successful import, but now that they are deleted it could work. Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:Hi Eugen, hmmm, that should be :rbd -p cpVirtualMachines list | while read LINE; do osdmaptool --test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while read LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2 --pool 10 osdmap 2>&1; done; done | lessIt's a Proxmox system. There were only two snapshots on the PG, which I deleted now. Now nothing gets displayed on the PG... is that possible? A repair still fails unfortunately... Best & thank you for the hint! Karsten On 19.02.2018 22:42, Eugen Block wrote:BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to?Depending on how many PGs your cluster/pool has, you could dump your osdmap and then run the osdmaptool [1] for every rbd object in your pool and grep for the affected PG. That would be quick for a few objects, I guess: ceph1:~ # ceph osd getmap -o /tmp/osdmap ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap osdmaptool: osdmap file '/tmp/osdmap' object 'image1' -> 5.2 -> [0] ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap osdmaptool: osdmap file '/tmp/osdmap' object 'image2' -> 5.f -> [0] [1] https://www.hastexo.com/resources/hints-and-kinks/which-osd-stores-specific-rados-object/ Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:BTW - how can I find out, which RBDs are affected by this problem. Maybe a copy/remove of the affected RBDs could help? But how to find out to which RBDs this PG belongs to? Best Karsten On 19.02.2018 19:26, Karsten Becker wrote:Hi. Thank you for the tip. I just tried... but unfortunately the import aborts:Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.0000000000001d82:head# snapset 0=[]:{} Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:18# Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:24# Write #10:9de973fe:::rbd_data.966489238e1f29.000000000000250b:head# snapset 628=[24,21,17]:{18=[17],24=[24,21]} /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, MapCacher::Transaction<std::__cxx11::basic_string<char>, ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19 19:24:18.917515 /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7facb0c2a8f2] 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 6: (main()+0x3909) [0x55eef3561349] 7: (__libc_start_main()+0xf1) [0x7facae0892b1] 8: (_start()+0x2a) [0x55eef35e901a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (Aborted) ** in thread 7facba7de400 thread_name:ceph-objectstor ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (()+0x913f14) [0x55eef3c10f14] 2: (()+0x110c0) [0x7facaf5020c0] 3: (gsignal()+0xcf) [0x7facae09bfcf] 4: (abort()+0x16a) [0x7facae09d3fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x7facb0c2aa7e] 6: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9] 7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x55eef35f901b] 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x55eef35f9ae8] 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ObjectStore::Sequencer&)+0x1135) [0x55eef36002f5] 10: (main()+0x3909) [0x55eef3561349] 11: (__libc_start_main()+0xf1) [0x7facae0892b1] 12: (_start()+0x2a) [0x55eef35e901a] AbortedBest Karsten On 19.02.2018 17:09, Eugen Block wrote:Could [1] be of interest? Exporting the intact PG and importing it back to the rescpective OSD sounds promising. [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:Hi. We have size=3 min_size=2. But this "upgrade" has been done during the weekend. We had size=2 min_size=1 before. Best Karsten On 19.02.2018 13:02, Eugen Block wrote:Hi, just to rule out the obvious, which size does the pool have? You aren't running it with size = 2, do you? Zitat von Karsten Becker <karsten.becker@xxxxxxxxxxx>:Hi, I have one damaged PG in my cluster. All OSDs are BlueStore. How do I fix this?2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head expected clone 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:64e 1 missing 2018-02-19 11:00:23.183707 osd.29 [INF] repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head 1 missing clone(s) 2018-02-19 11:01:18.074666 mon.0 [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 2018-02-19 11:01:11.856529 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed 2018-02-19 11:01:24.333533 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent"ceph pg repair 10.7b9" fails and is not able to fix ist. A manually started scrub "ceph pg scrub 10.7b9" also. Best from Berlin/Germany Karsten Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comEcologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comEcologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comEcologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comEcologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comEcologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany)Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : eblock@xxxxxx Vorsitzende des Aufsichtsrates: Angelika Mozdzen Sitz und Registergericht: Hamburg, HRB 90934 Vorstand: Jens-U. Mozdzen USt-IdNr. DE 814 013 983 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com