________________________________________ From: Sage Weil <sage@xxxxxxxxxxxx> Sent: 03 February 2019 18:25 To: Philippe Van Hecke Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: Luminous cluster in very bad state need some assistance. On Sun, 3 Feb 2019, Philippe Van Hecke wrote: > Hello, > I'am working for BELNET the Belgian Natioanal Research Network > > We currently a manage a luminous ceph cluster on ubuntu 16.04 > with 144 hdd osd spread across two data centers with 6 osd nodes > on each datacenter. Osd(s) are 4 TB sata disk. > > Last week we had a network incident and the link between our 2 DC > begin to flap due top spt flap. This let our ceph > cluster in a very bad state with many pg stuck in different state. > I let the cluster the time to recover , but some osd doesn't restart. > I have read and try different stuff found in this mailing list but > this had the effect to be in worst situation because all my osds began to falling down one due to some bad pg. > > I then try the solution describ by our grec coleagues > https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-hard-way/ > > So i put a set noout and noscrub nodeep-scrub to osd that seem to freeze the situation. > > The cluster is only used to provide rbd disk to our cloud-compute and cloud-storage solution > and to our internal kvm vm > > It seem that only some pool are affected by unclean/unknown/unfound object > > And all is working well for other pool ( may be some speed issue ) > > I can confirm that data on affected pool are completly corrupted. > > You can find here https://filesender.belnet.be/?s=download&token=1fac6b04-dd35-46f7-b4a8-c851cfa06379 > a tgz file with a maximum information i can dump to give an overview > of the current state of the cluster. > > So i have 2 questions. > > Does removing affected pools w with stuck pg associated will remove the deffect pg ? Yes, but don't do that yet! From a quick look this looks like it can be worked around. First question is why you're hitting the assert on e.g. osd.49 0> 2019-02-01 09:23:36.963503 7fb548859e00 -1 /build/ceph-12.2.5/src/osd/PGLog.h: In function 'static void PGLog::read_log_and_missing(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, PGLog::IndexedLog&, missing_type&, bool, std::ostringstream&, bool, bool*, const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*, bool) [with missing_type = pg_missing_set<true>; std::ostringstream = std::__cxx11::basic_ostringstream<char>]' thread 7fb548859e00 time 2019-02-01 09:23:36.961237 /build/ceph-12.2.5/src/osd/PGLog.h: 1354: FAILED assert(last_e.version.version < e.version.version) If you can set debug osd = 20 on that osd, start it, and ceph-post-file the log, that would be helpful. 12.2.5 is a pretty old luminous release, but I don't see this in the tracker, so a log would be great. Your priority is probably to get the pools active, though. For osd.49, the problematic pg is 11.182, which your pg ls output shows as online and undersized but usable. You can use ceph-objectstore-tool --op export-remove to make a backup and remove it from the osd.49 and then that osd will likely start up. If you look at 11.ac, your only incomplete pg in pool 11, the query says "down_osds_we_would_probe": [ 49, 63 ], ..so if you get that OSD up that PG should peer. In pool 12, you have 12.14d "down_osds_we_would_probe": [ 9, 51 ], osd.51 won't start due to the same assert but on pg 15.246, and hte pg ls shows that pg is undersized but active, so doing the same --op export-remove on that osd will hopefully let it start. I'm guessing the same will work on the other 12.* pg, but see if it works on 11.182 first so that pool will be completely up and available. Let us know how it goes! sage Hi Sage, First of all tanks for your help Please find here https://filesender.belnet.be/?s=download&token=dea0edda-5b6a-4284-9ea1-c1fdf88b65e9 the osd log with debug info for osd.49. and indeed if all buggy osd can restart that can may be solve the issue. But i also happy that you confirm my understanding that in the worst case removing pool can also resolve the problem even in this case i lose data but finish with a working cluster. Kr Philippe PS: don't know and don't want to open debat about top/bottom posting but would like to know the preference of this list :-) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com