On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote: > Hello, > > Thank you for your answer. > > indeed the min_size is 1: > > # ceph osd pool get volumes size > size: 3 > # ceph osd pool get volumes min_size > min_size: 1 > # > I'm gonna try to find the mentioned discussions on the mailing lists, and > read them. If you have a link at hand, that would be nice if you would send > it to me. This thread is one example, there are lots more. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html > > In the attached file you can see the contents of the directory containing PG > data on the different OSDs (all that have appeared in the pg query). > According to the md5sums the files are identical. What bothers me is the > directory structure (you can see the ls -R in each dir that contains files). So I mixed up 63 and 68, my list should have read 2, 28, 35 and 63 since 68 is listed as empty in the pg query. > > Where can I read about how/why those DIR# subdirectories have appeared? > > Given that the files themselves are identical on the "current" OSDs > belonging to the PG, and as the osd.63 (currently not belonging to the PG) > has the same files, is it safe to stop the OSD.2, remove the 3.367_head dir, > and then restart the OSD? (all these with the noout flag set of course) *You* need to decide which is the "good" copy and then follow the instructions in the links I provided to try and recover the pg. Back those known copies on 2, 28, 35 and 63 up with the ceph_objectstore_tool before proceeding. They may well be identical but the peering process still needs to "see" the relevant logs and currently something is stopping it doing so. > > Kind regards, > Laszlo > > > On 11.03.2017 00:32, Brad Hubbard wrote: >> >> So this is why it happened I guess. >> >> pool 3 'volumes' replicated size 3 min_size 1 >> >> min_size = 1 is a recipe for disasters like this and there are plenty >> of ML threads about not setting it below 2. >> >> The past intervals in the pg query show several intervals where a >> single OSD may have gone rw. >> >> How important is this data? >> >> I would suggest checking which of these OSDs actually have the data >> for this pg. From the pg query it looks like 2, 35 and 68 and possibly >> 28 since it's the primary. Check all OSDs in the pg query output. I >> would then back up all copies and work out which copy, if any, you >> want to keep and then attempt something like the following. >> >> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg17820.html >> >> If you want to abandon the pg see >> >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html >> for a possible solution. >> >> http://ceph.com/community/incomplete-pgs-oh-my/ may also give some ideas. >> >> >> On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> >> wrote: >>> >>> The OSDs are all there. >>> >>> $ sudo ceph osd stat >>> osdmap e60609: 72 osds: 72 up, 72 in >>> >>> an I have attached the result of ceph osd tree, and ceph osd dump >>> commands. >>> I got some extra info about the network problem. A faulty network device >>> has >>> flooded the network eating up all the bandwidth so the OSDs were not able >>> to >>> properly communicate with each other. This has lasted for almost 1 day. >>> >>> Thank you, >>> Laszlo >>> >>> >>> >>> On 10.03.2017 12:19, Brad Hubbard wrote: >>>> >>>> >>>> To me it looks like someone may have done an "rm" on these OSDs but >>>> not removed them from the crushmap. This does not happen >>>> automatically. >>>> >>>> Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ? If so, >>>> paste the output. >>>> >>>> Without knowing what exactly happened here it may be difficult to work >>>> out how to proceed. >>>> >>>> In order to go clean the primary needs to communicate with multiple >>>> OSDs, some of which are marked DNE and seem to be uncontactable. >>>> >>>> This seems to be more than a network issue (unless the outage is still >>>> happening). >>>> >>>> >>>> >>>> http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete >>>> >>>> >>>> >>>> On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> >>>> wrote: >>>>> >>>>> >>>>> Hello, >>>>> >>>>> I was informed that due to a networking issue the ceph cluster network >>>>> was >>>>> affected. There was a huge packet loss, and network interfaces were >>>>> flipping. That's all I got. >>>>> This outage has lasted a longer period of time. So I assume that some >>>>> OSD >>>>> may have been considered dead and the data from them has been moved >>>>> away >>>>> to >>>>> other PGs (this is what ceph is supposed to do if I'm correct). >>>>> Probably >>>>> that was the point when the listed PGs have appeared into the picture. >>>>> From the query we can see this for one of those OSDs: >>>>> { >>>>> "peer": "14", >>>>> "pgid": "3.367", >>>>> "last_update": "0'0", >>>>> "last_complete": "0'0", >>>>> "log_tail": "0'0", >>>>> "last_user_version": 0, >>>>> "last_backfill": "MAX", >>>>> "purged_snaps": "[]", >>>>> "history": { >>>>> "epoch_created": 4, >>>>> "last_epoch_started": 54899, >>>>> "last_epoch_clean": 55143, >>>>> "last_epoch_split": 0, >>>>> "same_up_since": 60603, >>>>> "same_interval_since": 60603, >>>>> "same_primary_since": 60593, >>>>> "last_scrub": "2852'33528", >>>>> "last_scrub_stamp": "2017-02-26 02:36:55.210150", >>>>> "last_deep_scrub": "2852'16480", >>>>> "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448", >>>>> "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150" >>>>> }, >>>>> "stats": { >>>>> "version": "0'0", >>>>> "reported_seq": "14", >>>>> "reported_epoch": "59779", >>>>> "state": "down+peering", >>>>> "last_fresh": "2017-02-27 16:30:16.230519", >>>>> "last_change": "2017-02-27 16:30:15.267995", >>>>> "last_active": "0.000000", >>>>> "last_peered": "0.000000", >>>>> "last_clean": "0.000000", >>>>> "last_became_active": "0.000000", >>>>> "last_became_peered": "0.000000", >>>>> "last_unstale": "2017-02-27 16:30:16.230519", >>>>> "last_undegraded": "2017-02-27 16:30:16.230519", >>>>> "last_fullsized": "2017-02-27 16:30:16.230519", >>>>> "mapping_epoch": 60601, >>>>> "log_start": "0'0", >>>>> "ondisk_log_start": "0'0", >>>>> "created": 4, >>>>> "last_epoch_clean": 55143, >>>>> "parent": "0.0", >>>>> "parent_split_bits": 0, >>>>> "last_scrub": "2852'33528", >>>>> "last_scrub_stamp": "2017-02-26 02:36:55.210150", >>>>> "last_deep_scrub": "2852'16480", >>>>> "last_deep_scrub_stamp": "2017-02-21 00:14:08.866448", >>>>> "last_clean_scrub_stamp": "2017-02-26 02:36:55.210150", >>>>> "log_size": 0, >>>>> "ondisk_log_size": 0, >>>>> "stats_invalid": "0", >>>>> "stat_sum": { >>>>> "num_bytes": 0, >>>>> "num_objects": 0, >>>>> "num_object_clones": 0, >>>>> "num_object_copies": 0, >>>>> "num_objects_missing_on_primary": 0, >>>>> "num_objects_degraded": 0, >>>>> "num_objects_misplaced": 0, >>>>> "num_objects_unfound": 0, >>>>> "num_objects_dirty": 0, >>>>> "num_whiteouts": 0, >>>>> "num_read": 0, >>>>> "num_read_kb": 0, >>>>> "num_write": 0, >>>>> "num_write_kb": 0, >>>>> "num_scrub_errors": 0, >>>>> "num_shallow_scrub_errors": 0, >>>>> "num_deep_scrub_errors": 0, >>>>> "num_objects_recovered": 0, >>>>> "num_bytes_recovered": 0, >>>>> "num_keys_recovered": 0, >>>>> "num_objects_omap": 0, >>>>> "num_objects_hit_set_archive": 0, >>>>> "num_bytes_hit_set_archive": 0 >>>>> }, >>>>> "up": [ >>>>> 28, >>>>> 35, >>>>> 2 >>>>> ], >>>>> "acting": [ >>>>> 28, >>>>> 35, >>>>> 2 >>>>> ], >>>>> "blocked_by": [], >>>>> "up_primary": 28, >>>>> "acting_primary": 28 >>>>> }, >>>>> "empty": 1, >>>>> "dne": 0, >>>>> "incomplete": 0, >>>>> "last_epoch_started": 0, >>>>> "hit_set_history": { >>>>> "current_last_update": "0'0", >>>>> "current_last_stamp": "0.000000", >>>>> "current_info": { >>>>> "begin": "0.000000", >>>>> "end": "0.000000", >>>>> "version": "0'0", >>>>> "using_gmt": "1" >>>>> }, >>>>> "history": [] >>>>> } >>>>> }, >>>>> >>>>> Where can I read more about the meaning of each parameter, some of them >>>>> have >>>>> quite self explanatory names, but not all (or probably we need a deeper >>>>> knowledge to understand them). >>>>> Isn't there any parameter that would say when was that OSD assigned to >>>>> the >>>>> given PG? Also the stat_sum shows 0 for all its parameters. Why is it >>>>> blocking then? >>>>> >>>>> Is there a way to tell the PG to forget about that OSD? >>>>> >>>>> Thank you, >>>>> Laszlo >>>>> >>>>> >>>>> On 10.03.2017 03:05, Brad Hubbard wrote: >>>>>> >>>>>> >>>>>> >>>>>> Can you explain more about what happened? >>>>>> >>>>>> The query shows progress is blocked by the following OSDs. >>>>>> >>>>>> "blocked_by": [ >>>>>> 14, >>>>>> 17, >>>>>> 51, >>>>>> 58, >>>>>> 63, >>>>>> 64, >>>>>> 68, >>>>>> 70 >>>>>> ], >>>>>> >>>>>> Some of these OSDs are marked as "dne" (Does Not Exist). >>>>>> >>>>>> peer": "17", >>>>>> "dne": 1, >>>>>> "peer": "51", >>>>>> "dne": 1, >>>>>> "peer": "58", >>>>>> "dne": 1, >>>>>> "peer": "64", >>>>>> "dne": 1, >>>>>> "peer": "70", >>>>>> "dne": 1, >>>>>> >>>>>> Can we get a complete background here please? >>>>>> >>>>>> >>>>>> On Thu, Mar 9, 2017 at 10:53 PM, Laszlo Budai >>>>>> <laszlo@xxxxxxxxxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> After a major network outage our ceph cluster ended up with an >>>>>>> inactive >>>>>>> PG: >>>>>>> >>>>>>> # ceph health detail >>>>>>> HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck >>>>>>> unclean; >>>>>>> 1 >>>>>>> requests are blocked > 32 sec; 1 osds have slow requests >>>>>>> pg 3.367 is stuck inactive for 912263.766607, current state >>>>>>> incomplete, >>>>>>> last >>>>>>> acting [28,35,2] >>>>>>> pg 3.367 is stuck unclean for 912263.766688, current state >>>>>>> incomplete, >>>>>>> last >>>>>>> acting [28,35,2] >>>>>>> pg 3.367 is incomplete, acting [28,35,2] >>>>>>> 1 ops are blocked > 268435 sec >>>>>>> 1 ops are blocked > 268435 sec on osd.28 >>>>>>> 1 osds have slow requests >>>>>>> >>>>>>> # ceph -s >>>>>>> cluster 6713d1b8-83da-11e6-aa79-525400d98c5a >>>>>>> health HEALTH_WARN >>>>>>> 1 pgs incomplete >>>>>>> 1 pgs stuck inactive >>>>>>> 1 pgs stuck unclean >>>>>>> 1 requests are blocked > 32 sec >>>>>>> monmap e3: 3 mons at >>>>>>> >>>>>>> >>>>>>> >>>>>>> {tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0} >>>>>>> election epoch 72, quorum 0,1,2 >>>>>>> tv-dl360-1,tv-dl360-2,tv-dl360-3 >>>>>>> osdmap e60609: 72 osds: 72 up, 72 in >>>>>>> pgmap v3670252: 4864 pgs, 11 pools, 134 GB data, 23778 objects >>>>>>> 490 GB used, 130 TB / 130 TB avail >>>>>>> 4863 active+clean >>>>>>> 1 incomplete >>>>>>> client io 0 B/s rd, 38465 B/s wr, 2 op/s >>>>>>> >>>>>>> ceph pg repair doesn't change anything. What should I try to recover >>>>>>> it? >>>>>>> Attached is the result of ceph pg query on the problem PG. >>>>>>> >>>>>>> Thank you, >>>>>>> Laszlo >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com