On Sun, Mar 12, 2017 at 7:51 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote: > Hello, > > I have already done the export with ceph_objectstore_tool. I just have to > decide which OSDs to keep. > Can you tell me why the directory structure in the OSDs is different for the > same PG when checking on different OSDs? > For instance, in OSD 2 and 63 there are NO subdirectories in the > 3.367__head, while OSD 28, 35 contains > ./DIR_7/DIR_6/DIR_B/ > ./DIR_7/DIR_6/DIR_3/ > > When are these subdirectories created? > > The files are identical on all the OSDs, only the way how these are stored > is different. It would be enough if you could point me to some documentation > that explain these, I'll read it. So far, searching for the architecture of > an OSD, I could not find the gory details about these directories. https://github.com/ceph/ceph/blob/master/src/os/filestore/HashIndex.h > > Kind regards, > Laszlo > > > On 12.03.2017 02:12, Brad Hubbard wrote: >> >> On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> >> wrote: >>> >>> Hello, >>> >>> Thank you for your answer. >>> >>> indeed the min_size is 1: >>> >>> # ceph osd pool get volumes size >>> size: 3 >>> # ceph osd pool get volumes min_size >>> min_size: 1 >>> # >>> I'm gonna try to find the mentioned discussions on the mailing lists, and >>> read them. If you have a link at hand, that would be nice if you would >>> send >>> it to me. >> >> >> This thread is one example, there are lots more. >> >> >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html >> >>> >>> In the attached file you can see the contents of the directory containing >>> PG >>> data on the different OSDs (all that have appeared in the pg query). >>> According to the md5sums the files are identical. What bothers me is the >>> directory structure (you can see the ls -R in each dir that contains >>> files). >> >> >> So I mixed up 63 and 68, my list should have read 2, 28, 35 and 63 >> since 68 is listed as empty in the pg query. >> >>> >>> Where can I read about how/why those DIR# subdirectories have appeared? >>> >>> Given that the files themselves are identical on the "current" OSDs >>> belonging to the PG, and as the osd.63 (currently not belonging to the >>> PG) >>> has the same files, is it safe to stop the OSD.2, remove the 3.367_head >>> dir, >>> and then restart the OSD? (all these with the noout flag set of course) >> >> >> *You* need to decide which is the "good" copy and then follow the >> instructions in the links I provided to try and recover the pg. Back >> those known copies on 2, 28, 35 and 63 up with the >> ceph_objectstore_tool before proceeding. They may well be identical >> but the peering process still needs to "see" the relevant logs and >> currently something is stopping it doing so. >> >>> >>> Kind regards, >>> Laszlo >>> >>> >>> On 11.03.2017 00:32, Brad Hubbard wrote: >>>> >>>> >>>> So this is why it happened I guess. >>>> >>>> pool 3 'volumes' replicated size 3 min_size 1 >>>> >>>> min_size = 1 is a recipe for disasters like this and there are plenty >>>> of ML threads about not setting it below 2. >>>> >>>> The past intervals in the pg query show several intervals where a >>>> single OSD may have gone rw. >>>> >>>> How important is this data? >>>> >>>> I would suggest checking which of these OSDs actually have the data >>>> for this pg. From the pg query it looks like 2, 35 and 68 and possibly >>>> 28 since it's the primary. Check all OSDs in the pg query output. I >>>> would then back up all copies and work out which copy, if any, you >>>> want to keep and then attempt something like the following. >>>> >>>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg17820.html >>>> >>>> If you want to abandon the pg see >>>> >>>> >>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html >>>> for a possible solution. >>>> >>>> http://ceph.com/community/incomplete-pgs-oh-my/ may also give some >>>> ideas. >>>> >>>> >>>> On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> >>>> wrote: >>>>> >>>>> >>>>> The OSDs are all there. >>>>> >>>>> $ sudo ceph osd stat >>>>> osdmap e60609: 72 osds: 72 up, 72 in >>>>> >>>>> an I have attached the result of ceph osd tree, and ceph osd dump >>>>> commands. >>>>> I got some extra info about the network problem. A faulty network >>>>> device >>>>> has >>>>> flooded the network eating up all the bandwidth so the OSDs were not >>>>> able >>>>> to >>>>> properly communicate with each other. This has lasted for almost 1 day. >>>>> >>>>> Thank you, >>>>> Laszlo >>>>> >>>>> >>>>> >>>>> On 10.03.2017 12:19, Brad Hubbard wrote: >>>>>> >>>>>> >>>>>> >>>>>> To me it looks like someone may have done an "rm" on these OSDs but >>>>>> not removed them from the crushmap. This does not happen >>>>>> automatically. >>>>>> >>>>>> Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ? If so, >>>>>> paste the output. >>>>>> >>>>>> Without knowing what exactly happened here it may be difficult to work >>>>>> out how to proceed. >>>>>> >>>>>> In order to go clean the primary needs to communicate with multiple >>>>>> OSDs, some of which are marked DNE and seem to be uncontactable. >>>>>> >>>>>> This seems to be more than a network issue (unless the outage is still >>>>>> happening). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai >>>>>> <laszlo@xxxxxxxxxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I was informed that due to a networking issue the ceph cluster >>>>>>> network >>>>>>> was >>>>>>> affected. There was a huge packet loss, and network interfaces were >>>>>>> flipping. That's all I got. >>>>>>> This outage has lasted a longer period of time. So I assume that some >>>>>>> OSD >>>>>>> may have been considered dead and the data from them has been moved >>>>>>> away >>>>>>> to >>>>>>> other PGs (this is what ceph is supposed to do if I'm correct). >>>>>>> Probably >>>>>>> that was the point when the listed PGs have appeared into the >>>>>>> picture. >>>>>>> From the query we can see this for one of those OSDs: >>>>>>> { >>>>>>> "peer": "14", >>>>>>> "pgid": "3.367", >>>>>>> "last_update": "0'0", >>>>>>> "last_complete": "0'0", >>>>>>> "log_tail": "0'0", >>>>>>> "last_user_version": 0, >>>>>>> "last_backfill": "MAX", >>>>>>> "purged_snaps": "[]", >>>>>>> "history": { >>>>>>> "epoch_created": 4, >>>>>>> "last_epoch_started": 54899, >>>>>>> "last_epoch_clean": 55143, >>>>>>> "last_epoch_split": 0, >>>>>>> "same_up_since": 60603, >>>>>>> "same_interval_since": 60603, >>>>>>> "same_primary_since": 60593, >>>>>>> "last_scrub": "2852'33528", >>>>>>> "last_scrub_stamp": "2017-02-26 02:36:55.210150", >>>>>>> "last_deep_scrub": "2852'16480", >>>>>>> "last_deep_scrub_stamp": "2017-02-21 >>>>>>> 00:14:08.866448", >>>>>>> "last_clean_scrub_stamp": "2017-02-26 >>>>>>> 02:36:55.210150" >>>>>>> }, >>>>>>> "stats": { >>>>>>> "version": "0'0", >>>>>>> "reported_seq": "14", >>>>>>> "reported_epoch": "59779", >>>>>>> "state": "down+peering", >>>>>>> "last_fresh": "2017-02-27 16:30:16.230519", >>>>>>> "last_change": "2017-02-27 16:30:15.267995", >>>>>>> "last_active": "0.000000", >>>>>>> "last_peered": "0.000000", >>>>>>> "last_clean": "0.000000", >>>>>>> "last_became_active": "0.000000", >>>>>>> "last_became_peered": "0.000000", >>>>>>> "last_unstale": "2017-02-27 16:30:16.230519", >>>>>>> "last_undegraded": "2017-02-27 16:30:16.230519", >>>>>>> "last_fullsized": "2017-02-27 16:30:16.230519", >>>>>>> "mapping_epoch": 60601, >>>>>>> "log_start": "0'0", >>>>>>> "ondisk_log_start": "0'0", >>>>>>> "created": 4, >>>>>>> "last_epoch_clean": 55143, >>>>>>> "parent": "0.0", >>>>>>> "parent_split_bits": 0, >>>>>>> "last_scrub": "2852'33528", >>>>>>> "last_scrub_stamp": "2017-02-26 02:36:55.210150", >>>>>>> "last_deep_scrub": "2852'16480", >>>>>>> "last_deep_scrub_stamp": "2017-02-21 >>>>>>> 00:14:08.866448", >>>>>>> "last_clean_scrub_stamp": "2017-02-26 >>>>>>> 02:36:55.210150", >>>>>>> "log_size": 0, >>>>>>> "ondisk_log_size": 0, >>>>>>> "stats_invalid": "0", >>>>>>> "stat_sum": { >>>>>>> "num_bytes": 0, >>>>>>> "num_objects": 0, >>>>>>> "num_object_clones": 0, >>>>>>> "num_object_copies": 0, >>>>>>> "num_objects_missing_on_primary": 0, >>>>>>> "num_objects_degraded": 0, >>>>>>> "num_objects_misplaced": 0, >>>>>>> "num_objects_unfound": 0, >>>>>>> "num_objects_dirty": 0, >>>>>>> "num_whiteouts": 0, >>>>>>> "num_read": 0, >>>>>>> "num_read_kb": 0, >>>>>>> "num_write": 0, >>>>>>> "num_write_kb": 0, >>>>>>> "num_scrub_errors": 0, >>>>>>> "num_shallow_scrub_errors": 0, >>>>>>> "num_deep_scrub_errors": 0, >>>>>>> "num_objects_recovered": 0, >>>>>>> "num_bytes_recovered": 0, >>>>>>> "num_keys_recovered": 0, >>>>>>> "num_objects_omap": 0, >>>>>>> "num_objects_hit_set_archive": 0, >>>>>>> "num_bytes_hit_set_archive": 0 >>>>>>> }, >>>>>>> "up": [ >>>>>>> 28, >>>>>>> 35, >>>>>>> 2 >>>>>>> ], >>>>>>> "acting": [ >>>>>>> 28, >>>>>>> 35, >>>>>>> 2 >>>>>>> ], >>>>>>> "blocked_by": [], >>>>>>> "up_primary": 28, >>>>>>> "acting_primary": 28 >>>>>>> }, >>>>>>> "empty": 1, >>>>>>> "dne": 0, >>>>>>> "incomplete": 0, >>>>>>> "last_epoch_started": 0, >>>>>>> "hit_set_history": { >>>>>>> "current_last_update": "0'0", >>>>>>> "current_last_stamp": "0.000000", >>>>>>> "current_info": { >>>>>>> "begin": "0.000000", >>>>>>> "end": "0.000000", >>>>>>> "version": "0'0", >>>>>>> "using_gmt": "1" >>>>>>> }, >>>>>>> "history": [] >>>>>>> } >>>>>>> }, >>>>>>> >>>>>>> Where can I read more about the meaning of each parameter, some of >>>>>>> them >>>>>>> have >>>>>>> quite self explanatory names, but not all (or probably we need a >>>>>>> deeper >>>>>>> knowledge to understand them). >>>>>>> Isn't there any parameter that would say when was that OSD assigned >>>>>>> to >>>>>>> the >>>>>>> given PG? Also the stat_sum shows 0 for all its parameters. Why is it >>>>>>> blocking then? >>>>>>> >>>>>>> Is there a way to tell the PG to forget about that OSD? >>>>>>> >>>>>>> Thank you, >>>>>>> Laszlo >>>>>>> >>>>>>> >>>>>>> On 10.03.2017 03:05, Brad Hubbard wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Can you explain more about what happened? >>>>>>>> >>>>>>>> The query shows progress is blocked by the following OSDs. >>>>>>>> >>>>>>>> "blocked_by": [ >>>>>>>> 14, >>>>>>>> 17, >>>>>>>> 51, >>>>>>>> 58, >>>>>>>> 63, >>>>>>>> 64, >>>>>>>> 68, >>>>>>>> 70 >>>>>>>> ], >>>>>>>> >>>>>>>> Some of these OSDs are marked as "dne" (Does Not Exist). >>>>>>>> >>>>>>>> peer": "17", >>>>>>>> "dne": 1, >>>>>>>> "peer": "51", >>>>>>>> "dne": 1, >>>>>>>> "peer": "58", >>>>>>>> "dne": 1, >>>>>>>> "peer": "64", >>>>>>>> "dne": 1, >>>>>>>> "peer": "70", >>>>>>>> "dne": 1, >>>>>>>> >>>>>>>> Can we get a complete background here please? >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Mar 9, 2017 at 10:53 PM, Laszlo Budai >>>>>>>> <laszlo@xxxxxxxxxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> After a major network outage our ceph cluster ended up with an >>>>>>>>> inactive >>>>>>>>> PG: >>>>>>>>> >>>>>>>>> # ceph health detail >>>>>>>>> HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck >>>>>>>>> unclean; >>>>>>>>> 1 >>>>>>>>> requests are blocked > 32 sec; 1 osds have slow requests >>>>>>>>> pg 3.367 is stuck inactive for 912263.766607, current state >>>>>>>>> incomplete, >>>>>>>>> last >>>>>>>>> acting [28,35,2] >>>>>>>>> pg 3.367 is stuck unclean for 912263.766688, current state >>>>>>>>> incomplete, >>>>>>>>> last >>>>>>>>> acting [28,35,2] >>>>>>>>> pg 3.367 is incomplete, acting [28,35,2] >>>>>>>>> 1 ops are blocked > 268435 sec >>>>>>>>> 1 ops are blocked > 268435 sec on osd.28 >>>>>>>>> 1 osds have slow requests >>>>>>>>> >>>>>>>>> # ceph -s >>>>>>>>> cluster 6713d1b8-83da-11e6-aa79-525400d98c5a >>>>>>>>> health HEALTH_WARN >>>>>>>>> 1 pgs incomplete >>>>>>>>> 1 pgs stuck inactive >>>>>>>>> 1 pgs stuck unclean >>>>>>>>> 1 requests are blocked > 32 sec >>>>>>>>> monmap e3: 3 mons at >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> {tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0} >>>>>>>>> election epoch 72, quorum 0,1,2 >>>>>>>>> tv-dl360-1,tv-dl360-2,tv-dl360-3 >>>>>>>>> osdmap e60609: 72 osds: 72 up, 72 in >>>>>>>>> pgmap v3670252: 4864 pgs, 11 pools, 134 GB data, 23778 >>>>>>>>> objects >>>>>>>>> 490 GB used, 130 TB / 130 TB avail >>>>>>>>> 4863 active+clean >>>>>>>>> 1 incomplete >>>>>>>>> client io 0 B/s rd, 38465 B/s wr, 2 op/s >>>>>>>>> >>>>>>>>> ceph pg repair doesn't change anything. What should I try to >>>>>>>>> recover >>>>>>>>> it? >>>>>>>>> Attached is the result of ceph pg query on the problem PG. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Laszlo >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list >>>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com