[root@storage2 ~]# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/ceph-objectstore-tool...Reading symbols from /usr/lib/debug/usr/bin/ceph-objectstore-tool.debug...done. done. Starting program: /usr/bin/ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". open: No such file or directory [Inferior 1 (process 23735) exited with code 01] [root@storage2 ~]# Just checked: [root@storage2 lib64]# ls -l /lib64/libthread_db* -rwxr-xr-x. 1 root root 38352 May 12 2016 /lib64/libthread_db-1.0.so lrwxrwxrwx. 1 root root 19 Jun 7 2016 /lib64/libthread_db.so.1 -> libthread_db-1.0.so [root@storage2 lib64]# Kind regards, Laszlo On 16.03.2017 05:26, Brad Hubbard wrote:
Can you install the debuginfo for ceph (how this works depends on your distro) and run the following? # gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35 On Thu, Mar 16, 2017 at 12:02 AM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:Hello, the ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35 command crashes. ~# ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35 *** Caught signal (Segmentation fault) ** in thread 7f85b60e28c0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) 1: ceph-objectstore-tool() [0xaeeaba] 2: (()+0x10330) [0x7f85b4dca330] 3: (()+0xa2324) [0x7f85b1cd7324] 4: (()+0x7d23e) [0x7f85b1cb223e] 5: (()+0x7d478) [0x7f85b1cb2478] 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92] 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15) [0x7f85b1c8a0e5] 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c] 9: (main()+0x1294) [0x651134] 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45] 11: ceph-objectstore-tool() [0x66f8b7] 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal (Segmentation fault) ** in thread 7f85b60e28c0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) 1: ceph-objectstore-tool() [0xaeeaba] 2: (()+0x10330) [0x7f85b4dca330] 3: (()+0xa2324) [0x7f85b1cd7324] 4: (()+0x7d23e) [0x7f85b1cb223e] 5: (()+0x7d478) [0x7f85b1cb2478] 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92] 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15) [0x7f85b1c8a0e5] 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c] 9: (main()+0x1294) [0x651134] 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45] 11: ceph-objectstore-tool() [0x66f8b7] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -14> 2017-03-15 14:57:05.557743 7f85b60e28c0 5 asok(0x5632000) register_command perfcounters_dump hook 0x55e6130 -13> 2017-03-15 14:57:05.557807 7f85b60e28c0 5 asok(0x5632000) register_command 1 hook 0x55e6130 -12> 2017-03-15 14:57:05.557818 7f85b60e28c0 5 asok(0x5632000) register_command perf dump hook 0x55e6130 -11> 2017-03-15 14:57:05.557828 7f85b60e28c0 5 asok(0x5632000) register_command perfcounters_schema hook 0x55e6130 -10> 2017-03-15 14:57:05.557836 7f85b60e28c0 5 asok(0x5632000) register_command 2 hook 0x55e6130 -9> 2017-03-15 14:57:05.557841 7f85b60e28c0 5 asok(0x5632000) register_command perf schema hook 0x55e6130 -8> 2017-03-15 14:57:05.557851 7f85b60e28c0 5 asok(0x5632000) register_command perf reset hook 0x55e6130 -7> 2017-03-15 14:57:05.557855 7f85b60e28c0 5 asok(0x5632000) register_command config show hook 0x55e6130 -6> 2017-03-15 14:57:05.557864 7f85b60e28c0 5 asok(0x5632000) register_command config set hook 0x55e6130 -5> 2017-03-15 14:57:05.557868 7f85b60e28c0 5 asok(0x5632000) register_command config get hook 0x55e6130 -4> 2017-03-15 14:57:05.557877 7f85b60e28c0 5 asok(0x5632000) register_command config diff hook 0x55e6130 -3> 2017-03-15 14:57:05.557880 7f85b60e28c0 5 asok(0x5632000) register_command log flush hook 0x55e6130 -2> 2017-03-15 14:57:05.557888 7f85b60e28c0 5 asok(0x5632000) register_command log dump hook 0x55e6130 -1> 2017-03-15 14:57:05.557892 7f85b60e28c0 5 asok(0x5632000) register_command log reopen hook 0x55e6130 0> 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal (Segmentation fault) ** in thread 7f85b60e28c0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) 1: ceph-objectstore-tool() [0xaeeaba] 2: (()+0x10330) [0x7f85b4dca330] 3: (()+0xa2324) [0x7f85b1cd7324] 4: (()+0x7d23e) [0x7f85b1cb223e] 5: (()+0x7d478) [0x7f85b1cb2478] 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92] 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15) [0x7f85b1c8a0e5] 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c] 9: (main()+0x1294) [0x651134] 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45] 11: ceph-objectstore-tool() [0x66f8b7] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 500 max_new 1000 log_file --- end dump of recent events --- Segmentation fault (core dumped) # Any ideas what to try? Thank you. Laszlo On 15.03.2017 04:27, Brad Hubbard wrote:Decide which copy you want to keep and export that with ceph-objectstore-tool Delete all copies on all OSDs with ceph-objectstore-tool (not by deleting the directory on the disk). Use force_create_pg to recreate the pg empty. Use ceph-objectstore-tool to do a rados import on the exported pg copy. On Wed, Mar 15, 2017 at 12:00 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:Hello, I have tried to recover the pg using the following steps: Preparation: 1. set noout 2. stop osd.2 3. use ceph-objectstore-tool to export from osd2 4. start osd.2 5. repeat step 2-4 on osd 35,28, 63 (I've done these hoping to be able to use one of those exports to recover the PG) First attempt: 1. stop osd.2 2. remove the 3.367_head directory 3. start osd.2 Here I was hoping that the cluster will recover the pg from the 2 other identical osds. It did NOT. So I have tried the following commands on the PG: ceph pg repair ceph pg scrub ceph pg deep-scrub ceph pg force_create_pg nothing changed. My PG was still incomplete. So I tried to remove all the OSDs that were referenced in the pg query: 1. stop osd.2 2. delete the 3.367_head directory 3. start osd2 4 repeat steps 6-8 for all the OSDs that were listed in the pg query 5. did an import from one of the exports. -> I was able again to query the pg (that was impossible when all the 3.367_head dirs were deleted) and the stats were saying that the number of objects is 6 the size is 21M (all correct values according to the files I was able to see before starting the procedure) But the PG is still incomplete. What else can I try? Thank you, Laszlo On 12.03.2017 13:06, Brad Hubbard wrote:On Sun, Mar 12, 2017 at 7:51 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:Hello, I have already done the export with ceph_objectstore_tool. I just have to decide which OSDs to keep. Can you tell me why the directory structure in the OSDs is different for the same PG when checking on different OSDs? For instance, in OSD 2 and 63 there are NO subdirectories in the 3.367__head, while OSD 28, 35 contains ./DIR_7/DIR_6/DIR_B/ ./DIR_7/DIR_6/DIR_3/ When are these subdirectories created? The files are identical on all the OSDs, only the way how these are stored is different. It would be enough if you could point me to some documentation that explain these, I'll read it. So far, searching for the architecture of an OSD, I could not find the gory details about these directories.https://github.com/ceph/ceph/blob/master/src/os/filestore/HashIndex.hKind regards, Laszlo On 12.03.2017 02:12, Brad Hubbard wrote:On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:Hello, Thank you for your answer. indeed the min_size is 1: # ceph osd pool get volumes size size: 3 # ceph osd pool get volumes min_size min_size: 1 # I'm gonna try to find the mentioned discussions on the mailing lists, and read them. If you have a link at hand, that would be nice if you would send it to me.This thread is one example, there are lots more. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.htmlIn the attached file you can see the contents of the directory containing PG data on the different OSDs (all that have appeared in the pg query). According to the md5sums the files are identical. What bothers me is the directory structure (you can see the ls -R in each dir that contains files).So I mixed up 63 and 68, my list should have read 2, 28, 35 and 63 since 68 is listed as empty in the pg query.Where can I read about how/why those DIR# subdirectories have appeared? Given that the files themselves are identical on the "current" OSDs belonging to the PG, and as the osd.63 (currently not belonging to the PG) has the same files, is it safe to stop the OSD.2, remove the 3.367_head dir, and then restart the OSD? (all these with the noout flag set of course)*You* need to decide which is the "good" copy and then follow the instructions in the links I provided to try and recover the pg. Back those known copies on 2, 28, 35 and 63 up with the ceph_objectstore_tool before proceeding. They may well be identical but the peering process still needs to "see" the relevant logs and currently something is stopping it doing so.Kind regards, Laszlo On 11.03.2017 00:32, Brad Hubbard wrote:So this is why it happened I guess. pool 3 'volumes' replicated size 3 min_size 1 min_size = 1 is a recipe for disasters like this and there are plenty of ML threads about not setting it below 2. The past intervals in the pg query show several intervals where a single OSD may have gone rw. How important is this data? I would suggest checking which of these OSDs actually have the data for this pg. From the pg query it looks like 2, 35 and 68 and possibly 28 since it's the primary. Check all OSDs in the pg query output. I would then back up all copies and work out which copy, if any, you want to keep and then attempt something like the following. https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg17820.html If you want to abandon the pg see http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html for a possible solution. http://ceph.com/community/incomplete-pgs-oh-my/ may also give some ideas. On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:The OSDs are all there. $ sudo ceph osd stat osdmap e60609: 72 osds: 72 up, 72 in an I have attached the result of ceph osd tree, and ceph osd dump commands. I got some extra info about the network problem. A faulty network device has flooded the network eating up all the bandwidth so the OSDs were not able to properly communicate with each other. This has lasted for almost 1 day. Thank you, Laszlo On 10.03.2017 12:19, Brad Hubbard wrote:To me it looks like someone may have done an "rm" on these OSDs but not removed them from the crushmap. This does not happen automatically. Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ? If so, paste the output. Without knowing what exactly happened here it may be difficult to work out how to proceed. In order to go clean the primary needs to communicate with multiple OSDs, some of which are marked DNE and seem to be uncontactable. This seems to be more than a network issue (unless the outage is still happening). http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:Hello, I was informed that due to a networking issue the ceph cluster network was affected. There was a huge packet loss, and network interfaces were flipping. That's all I got. This outage has lasted a longer period of time. So I assume that some OSD may have been considered dead and the data from them has been moved away to other PGs (this is what ceph is supposed to do if I'm correct). Probably that was the point when the listed PGs have appeared into the picture. From the query we can see this for one of those OSDs: { "peer": "14", "pgid": "3.367", "last_update": "0'0", "last_complete": "0'0", "log_tail": "0'0", "last_user_version": 0, "last_backfill": "MAX", "purged_snaps": "[]", "history": { "epoch_created": 4, "last_epoch_started": 54899, "last_epoch_clean": 55143, "last_epoch_split": 0, "same_up_since": 60603, "same_interval_since": 60603, "same_primary_since": 60593, "last_scrub": "2852'33528", "last_scrub_stamp": "2017-02-26 02:36:55.210150", "last_deep_scrub": "2852'16480", "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448", "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150" }, "stats": { "version": "0'0", "reported_seq": "14", "reported_epoch": "59779", "state": "down+peering", "last_fresh": "2017-02-27 16:30:16.230519", "last_change": "2017-02-27 16:30:15.267995", "last_active": "0.000000", "last_peered": "0.000000", "last_clean": "0.000000", "last_became_active": "0.000000", "last_became_peered": "0.000000", "last_unstale": "2017-02-27 16:30:16.230519", "last_undegraded": "2017-02-27 16:30:16.230519", "last_fullsized": "2017-02-27 16:30:16.230519", "mapping_epoch": 60601, "log_start": "0'0", "ondisk_log_start": "0'0", "created": 4, "last_epoch_clean": 55143, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "2852'33528", "last_scrub_stamp": "2017-02-26 02:36:55.210150", "last_deep_scrub": "2852'16480", "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448", "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150", "log_size": 0, "ondisk_log_size": 0, "stats_invalid": "0", "stat_sum": { "num_bytes": 0, "num_objects": 0, "num_object_clones": 0, "num_object_copies": 0, "num_objects_missing_on_primary": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 0, "num_whiteouts": 0, "num_read": 0, "num_read_kb": 0, "num_write": 0, "num_write_kb": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0 }, "up": [ 28, 35, 2 ], "acting": [ 28, 35, 2 ], "blocked_by": [], "up_primary": 28, "acting_primary": 28 }, "empty": 1, "dne": 0, "incomplete": 0, "last_epoch_started": 0, "hit_set_history": { "current_last_update": "0'0", "current_last_stamp": "0.000000", "current_info": { "begin": "0.000000", "end": "0.000000", "version": "0'0", "using_gmt": "1" }, "history": [] } }, Where can I read more about the meaning of each parameter, some of them have quite self explanatory names, but not all (or probably we need a deeper knowledge to understand them). Isn't there any parameter that would say when was that OSD assigned to the given PG? Also the stat_sum shows 0 for all its parameters. Why is it blocking then? Is there a way to tell the PG to forget about that OSD? Thank you, Laszlo On 10.03.2017 03:05, Brad Hubbard wrote:Can you explain more about what happened? The query shows progress is blocked by the following OSDs. "blocked_by": [ 14, 17, 51, 58, 63, 64, 68, 70 ], Some of these OSDs are marked as "dne" (Does Not Exist). peer": "17", "dne": 1, "peer": "51", "dne": 1, "peer": "58", "dne": 1, "peer": "64", "dne": 1, "peer": "70", "dne": 1, Can we get a complete background here please? On Thu, Mar 9, 2017 at 10:53 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:Hello, After a major network outage our ceph cluster ended up with an inactive PG: # ceph health detail HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean; 1 requests are blocked > 32 sec; 1 osds have slow requests pg 3.367 is stuck inactive for 912263.766607, current state incomplete, last acting [28,35,2] pg 3.367 is stuck unclean for 912263.766688, current state incomplete, last acting [28,35,2] pg 3.367 is incomplete, acting [28,35,2] 1 ops are blocked > 268435 sec 1 ops are blocked > 268435 sec on osd.28 1 osds have slow requests # ceph -s cluster 6713d1b8-83da-11e6-aa79-525400d98c5a health HEALTH_WARN 1 pgs incomplete 1 pgs stuck inactive 1 pgs stuck unclean 1 requests are blocked > 32 sec monmap e3: 3 mons at {tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0} election epoch 72, quorum 0,1,2 tv-dl360-1,tv-dl360-2,tv-dl360-3 osdmap e60609: 72 osds: 72 up, 72 in pgmap v3670252: 4864 pgs, 11 pools, 134 GB data, 23778 objects 490 GB used, 130 TB / 130 TB avail 4863 active+clean 1 incomplete client io 0 B/s rd, 38465 B/s wr, 2 op/s ceph pg repair doesn't change anything. What should I try to recover it? Attached is the result of ceph pg query on the problem PG. Thank you, Laszlo _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com