Hi Dan: Aha - I think the first commit is probably it - before that commit, the fact that lo is highest in the interfaces enumeration didn't matter for us [since it would always be skipped]. This actually almost certainly also is associated with that other site with a similar problem (OSDs drop out until you restart the network interface), since I imagine that would reorder the interface list. Playing with our public and cluster bind address explicitly does seem to help, so we'll iterate on that and get to a suitable ceph.conf. Thanks for the help [and it was the network all along]! Sam On Mon, 22 Mar 2021 at 19:12, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > There are two commits between 14.2.16 and 14.2.18 related to loopback > network. Perhaps one of these is responsible for your issue [1]. > > I'd try playing with the options like cluster/public bind addr and > cluster/public bind interface until you can convince the osd to bind to the > correct listening IP. > > (That said, i don't know which version you're running on the logs shared > earlier. But I think you should try to get 14.2.18 working anyway). > > .. dan > > [1] > > > git log v14.2.18...v14.2.16 ipaddr.cc commit > 89321762ad4cfdd1a68cae467181bdd1a501f14d > Author: Thomas Goirand <zigo@xxxxxxxxxx> > Date: Fri Jan 15 10:50:05 2021 +0100 > > common/ipaddr: Allow binding on lo > > Commmit 5cf0fa872231f4eaf8ce6565a04ed675ba5b689b, solves the issue that > the osd can't restart after seting a virtual local loopback IP. > However, > this commit also prevents a bgp-to-the-host over unumbered Ipv6 > local-link is setup, where OSD typically are bound to the lo interface. > > To solve this, this single char patch simply checks against "lo:" to > match only virtual interfaces instead of anything that starts with > "lo". > > Fixes: https://tracker.ceph.com/issues/48893 > Signed-off-by: Thomas Goirand <zigo@xxxxxxxxxx> > (cherry picked from commit 201b59204374ebdab91bb554b986577a97b19c36) > > commit b52cae90d67eb878b3ddfe547b8bf16e0d4d1a45 > Author: lijaiwei1 <lijiawei1@xxxxxxxxxxxxxxx> > Date: Tue Dec 24 22:34:46 2019 +0800 > > common: skip interfaces starting with "lo" in find_ipv{4,6}_in_subnet() > > This will solve the issue that the osd can't restart after seting a > virtual local loopback IP. > In find_ipv4_in_subnet() and find_ipv6_in_subnet(), I use > boost::starts_with(addrs->ifa_name, "lo") to ship the interfaces > starting with "lo". > > Fixes: https://tracker.ceph.com/issues/43417 > Signed-off-by: Jiawei Li <lijiawei1@xxxxxxxxxxxxxxx> > (cherry picked from commit 5cf0fa872231f4eaf8ce6565a04ed675ba5b689b) > > > > > > On Mon, Mar 22, 2021, 7:42 PM Sam Skipsey <aoanla@xxxxxxxxx> wrote: > >> I don't think we explicitly set any ms settings in the OSD host ceph.conf >> [all the OSDs ceph.confs are identical across the entire cluster]. >> >> ip a gives: >> >> ip a >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group >> default qlen 1000 >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> inet 127.0.0.1/8 scope host lo >> valid_lft forever preferred_lft forever >> inet6 ::1/128 scope host >> valid_lft forever preferred_lft forever >> 2: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN >> group default qlen 1000 >> link/ether 4c:d9:8f:55:92:f6 brd ff:ff:ff:ff:ff:ff >> 3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN >> group default qlen 1000 >> link/ether 4c:d9:8f:55:92:f7 brd ff:ff:ff:ff:ff:ff >> 4: p2p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN >> group default qlen 1000 >> link/ether b4:96:91:3f:62:20 brd ff:ff:ff:ff:ff:ff >> 5: p2p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP >> group default qlen 1000 >> link/ether b4:96:91:3f:62:22 brd ff:ff:ff:ff:ff:ff >> inet 10.1.50.21/8 brd 10.255.255.255 scope global noprefixroute p2p2 >> valid_lft forever preferred_lft forever >> inet6 fe80::b696:91ff:fe3f:6222/64 scope link noprefixroute >> valid_lft forever preferred_lft forever >> >> (where here p2p2 is the only active network link, and is also the private >> and public network for the ceph cluster) >> >> The output is similar on other hosts - with p2p2 either at position 3 or >> 5 depending on the order the interfaces were enumerated. >> >> Sam >> >> On Mon, 22 Mar 2021 at 17:34, Dan van der Ster <dan@xxxxxxxxxxxxxx> >> wrote: >> >>> Which `ms` settings do you have in the OSD host's ceph.conf or the ceph >>> config dump? >>> >>> And how does `ip a` look on one of these hosts where the osd is >>> registering itself as 127.0.0.1? >>> >>> >>> You might as well set nodown again now. This will make ops pile up, but >>> that's the least of your concerns at the moment. >>> (With osds flapping the osdmaps churn and that inflates the mon store) >>> >>> .. Dan >>> >>> On Mon, Mar 22, 2021, 6:28 PM Sam Skipsey <aoanla@xxxxxxxxx> wrote: >>> >>>> Hm, yes it does [and I was wondering why loopbacks were showing up >>>> suddenly in the logs]. This wasn't happening with 14.2.16 so what's changed >>>> about how we specify stuff? >>>> >>>> This might correlate with the other person on the IRC list who has >>>> problems with 14.2.18 and their OSDs deciding they don't work sometimes >>>> until they forcibly restart their network links... >>>> >>>> >>>> Sam >>>> >>>> On Mon, 22 Mar 2021 at 17:20, Dan van der Ster <dan@xxxxxxxxxxxxxx> >>>> wrote: >>>> >>>>> What's with the OSDs having loopback addresses? E.g. v2: >>>>> 127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667 >>>>> >>>>> Does `ceph osd dump` show those same loopback addresses for each OSD? >>>>> >>>>> This sounds familiar... I'm trying to find the recent ticket. >>>>> >>>>> .. dan >>>>> >>>>> >>>>> On Mon, Mar 22, 2021, 6:07 PM Sam Skipsey <aoanla@xxxxxxxxx> wrote: >>>>> >>>>>> hi Dan: >>>>>> >>>>>> So, unsetting nodown results in... almost all of the OSDs being >>>>>> marked down. (231 down out of 328). >>>>>> Checking the actual OSD services, most of them were actually up and >>>>>> active on the nodes, even when the mons had marked them down. >>>>>> (On a few nodes, the down services corresponded to OSDs that had been >>>>>> flapping - but increasing osd_max_markdown locally to keep them up despite >>>>>> the previous flapping, and restarting the services... didn't help.) >>>>>> >>>>>> In fact, starting up the few OSD services which had actually stopped, >>>>>> resulted in a different set of OSDs being marked down, and some others >>>>>> coming up. >>>>>> We currently have a sort of "rolling OSD outness" passing through the >>>>>> cluster - there's always ~230 OSDs marked down now, but which ones those >>>>>> are changes (we've had everything from 1 HOST down to 4 HOSTS down over the >>>>>> past 14 minutes as things fluctuate. >>>>>> >>>>>> A log from one of the "down" OSDs [which is actually running, and on >>>>>> the same host as OSDs which are marked up] shows this worrying snippet >>>>>> >>>>>> 2021-03-22 17:01:45.298 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:45.298 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:46.340 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:46.340 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:47.376 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:47.376 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:48.395 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:48.395 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:49.407 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:49.407 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:50.400 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:50.400 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:50.922 7f6c9f088700 -1 --2- 10.1.50.21:0/23673 >> >>>>>> [v2:127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667] >>>>>> conn(0x56010903e400 0x56011a71fc00 unknown :-1 s=BANNER_CONNECTING pgs=0 >>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2: >>>>>> 127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667] is using msgr V1 >>>>>> protocol >>>>>> 2021-03-22 17:01:50.922 7f6c9f889700 -1 --2- 10.1.50.21:0/23673 >> >>>>>> [v2:127.0.0.1:6821/13015214,v1:127.0.0.1:6831/13015214] >>>>>> conn(0x5600df434000 0x56011718e000 unknown :-1 s=BANNER_CONNECTING pgs=0 >>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2: >>>>>> 127.0.0.1:6821/13015214,v1:127.0.0.1:6831/13015214] is using msgr V1 >>>>>> protocol >>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >> >>>>>> [v2:127.0.0.1:6826/11091658,v1:127.0.0.1:6828/11091658] >>>>>> conn(0x5600f85ed800 0x560109df2a00 unknown :-1 s=BANNER_CONNECTING pgs=0 >>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2: >>>>>> 127.0.0.1:6826/11091658,v1:127.0.0.1:6828/11091658] is using msgr V1 >>>>>> protocol >>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >> >>>>>> [v2:127.0.0.1:6859/2683393,v1:127.0.0.1:6862/2683393] >>>>>> conn(0x5600f22ea000 0x560117182300 unknown :-1 s=BANNER_CONNECTING pgs=0 >>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2: >>>>>> 127.0.0.1:6859/2683393,v1:127.0.0.1:6862/2683393] is using msgr V1 >>>>>> protocol >>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >> >>>>>> [v2:127.0.0.1:6901/15090566,v1:127.0.0.1:6907/15090566] >>>>>> conn(0x5600df435c00 0x560139370300 unknown :-1 s=BANNER_CONNECTING pgs=0 >>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2: >>>>>> 127.0.0.1:6901/15090566,v1:127.0.0.1:6907/15090566] is using msgr V1 >>>>>> protocol >>>>>> 2021-03-22 17:01:51.377 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:51.377 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:52.370 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:52.370 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:53.377 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:53.377 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:54.385 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:54.385 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:55.385 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:55.385 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:56.362 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:56.362 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> 2021-03-22 17:01:57.324 7f6c9c883700 1 osd.127 253515 is_healthy >>>>>> false -- only 0/10 up peers (less than 33%) >>>>>> 2021-03-22 17:01:57.324 7f6c9c883700 1 osd.127 253515 not healthy; >>>>>> waiting to boot >>>>>> >>>>>> >>>>>> >>>>>> Any suggestions? >>>>>> >>>>>> Sam >>>>>> >>>>>> P.S. an example ceph status as it is now [with everything now on >>>>>> 14.2.18, since we had to restart osds anyway]: >>>>>> >>>>>> cluster: >>>>>> id: a1148af2-6eaf-4486-a27e-a05a78c2b378 >>>>>> health: HEALTH_WARN >>>>>> pauserd,pausewr,noout,nobackfill,norebalance flag(s) set >>>>>> 230 osds down >>>>>> 4 hosts (80 osds) down >>>>>> Reduced data availability: 2048 pgs inactive >>>>>> 8 slow ops, oldest one blocked for 901 sec, mon.cephs01 >>>>>> has slow ops >>>>>> >>>>>> services: >>>>>> mon: 3 daemons, quorum cephs01,cephs02,cephs03 (age 2h) >>>>>> mgr: cephs01(active, since 77m) >>>>>> osd: 329 osds: 98 up (since 4s), 328 in (since 4d) >>>>>> flags pauserd,pausewr,noout,nobackfill,norebalance >>>>>> >>>>>> data: >>>>>> pools: 3 pools, 2048 pgs >>>>>> objects: 0 objects, 0 B >>>>>> usage: 0 B used, 0 B / 0 B avail >>>>>> pgs: 100.000% pgs unknown >>>>>> 2048 unknown >>>>>> >>>>>> >>>>>> >>>>>> On Mon, 22 Mar 2021 at 14:57, Dan van der Ster <dan@xxxxxxxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I would unset nodown (hiding osd failures) and norecover (blcoking >>>>>>> PGs >>>>>>> from recovering degraded objects), then start starting osds. >>>>>>> As soon as you have some osd logs reporting some failures, then >>>>>>> share those... >>>>>>> >>>>>>> - Dan >>>>>>> >>>>>>> On Mon, Mar 22, 2021 at 3:49 PM Sam Skipsey <aoanla@xxxxxxxxx> >>>>>>> wrote: >>>>>>> > >>>>>>> > So, we started the mons and mgr up again, and here's the relevant >>>>>>> logs, including also ceph versions. We've also turned off all of the >>>>>>> firewalls on all of the nodes so we know that there can't be network issues >>>>>>> [and, indeed, all of our management of the OSDs happens via logins from the >>>>>>> service nodes or to each other] >>>>>>> > >>>>>>> > > ceph status >>>>>>> > >>>>>>> > >>>>>>> > cluster: >>>>>>> > id: a1148af2-6eaf-4486-a27e-a05a78c2b378 >>>>>>> > health: HEALTH_WARN >>>>>>> > >>>>>>> pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set >>>>>>> > 1 nearfull osd(s) >>>>>>> > 3 pool(s) nearfull >>>>>>> > Reduced data availability: 2048 pgs inactive >>>>>>> > mons cephs01,cephs02,cephs03 are using a lot of disk >>>>>>> space >>>>>>> > >>>>>>> > services: >>>>>>> > mon: 3 daemons, quorum cephs01,cephs02,cephs03 (age 61s) >>>>>>> > mgr: cephs01(active, since 76s) >>>>>>> > osd: 329 osds: 329 up (since 63s), 328 in (since 4d); 466 >>>>>>> remapped pgs >>>>>>> > flags >>>>>>> pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover >>>>>>> > >>>>>>> > data: >>>>>>> > pools: 3 pools, 2048 pgs >>>>>>> > objects: 0 objects, 0 B >>>>>>> > usage: 0 B used, 0 B / 0 B avail >>>>>>> > pgs: 100.000% pgs unknown >>>>>>> > 2048 unknown >>>>>>> > >>>>>>> > >>>>>>> > > ceph health detail >>>>>>> > >>>>>>> > HEALTH_WARN >>>>>>> pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set; >>>>>>> 1 nearfull osd(s); 3 pool(s) nearfull; Reduced data availability: 2048 pgs >>>>>>> inactive; mons cephs01,cephs02,cephs03 are using a lot of disk space >>>>>>> > OSDMAP_FLAGS >>>>>>> pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set >>>>>>> > OSD_NEARFULL 1 nearfull osd(s) >>>>>>> > osd.63 is near full >>>>>>> > POOL_NEARFULL 3 pool(s) nearfull >>>>>>> > pool 'dteam' is nearfull >>>>>>> > pool 'atlas' is nearfull >>>>>>> > pool 'atlas-localgroup' is nearfull >>>>>>> > PG_AVAILABILITY Reduced data availability: 2048 pgs inactive >>>>>>> > pg 13.1ef is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f0 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f1 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f2 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f3 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f4 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f5 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f6 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f7 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f8 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1f9 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1fa is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1fb is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1fc is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1fd is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1fe is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 13.1ff is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1ec is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f0 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f1 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f2 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f3 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f4 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f5 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f6 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f7 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f8 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1f9 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1fa is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1fb is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1fc is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1fd is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1fe is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 14.1ff is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1ed is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f0 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f1 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f2 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f3 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f4 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f5 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f6 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f7 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f8 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1f9 is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1fa is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1fb is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1fc is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1fd is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1fe is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > pg 15.1ff is stuck inactive for 89.322981, current state >>>>>>> unknown, last acting [] >>>>>>> > MON_DISK_BIG mons cephs01,cephs02,cephs03 are using a lot of disk >>>>>>> space >>>>>>> > mon.cephs01 is 96 GiB >= mon_data_size_warn (15 GiB) >>>>>>> > mon.cephs02 is 96 GiB >= mon_data_size_warn (15 GiB) >>>>>>> > mon.cephs03 is 96 GiB >= mon_data_size_warn (15 GiB) >>>>>>> > >>>>>>> > >>>>>>> > > ceph versions >>>>>>> > >>>>>>> > { >>>>>>> > "mon": { >>>>>>> > "ceph version 14.2.18 >>>>>>> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3 >>>>>>> > }, >>>>>>> > "mgr": { >>>>>>> > "ceph version 14.2.18 >>>>>>> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1 >>>>>>> > }, >>>>>>> > "osd": { >>>>>>> > "ceph version 14.2.10 >>>>>>> (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)": 1, >>>>>>> > "ceph version 14.2.15 >>>>>>> (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 188, >>>>>>> > "ceph version 14.2.16 >>>>>>> (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable)": 18, >>>>>>> > "ceph version 14.2.18 >>>>>>> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 122 >>>>>>> > }, >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> >>>>>>> > >>>>>>> > As a note, the log where the mgr explodes (which precipitated all >>>>>>> of this) definitely shows the problem occurring on the 12th [when 14.2.17 >>>>>>> dropped], but things didn't "break" until we tried upgrading OSDs to >>>>>>> 14.2.18... >>>>>>> > >>>>>>> > >>>>>>> > Sam >>>>>>> > >>>>>>> > >>>>>>> > On Mon, 22 Mar 2021 at 12:20, Sam Skipsey <aoanla@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> Hi Dan: >>>>>>> >> >>>>>>> >> Thanks for the reply - at present, our mons and mgrs are off >>>>>>> [because of the unsustainable nature of the filesystem usage]. We'll try >>>>>>> putting them on again for long enough to get "ceph status" out of them, but >>>>>>> because the mgr was unable to actually talk to anything, and reply at that >>>>>>> point. >>>>>>> >> >>>>>>> >> (And thanks for the link to the bug tracker - I guess this >>>>>>> mismatch of expectations is why the devs are so keen to move to >>>>>>> containerised deployments where there is no co-location of different types >>>>>>> of server, as it means they don't need to worry as much about the >>>>>>> assumptions about when it's okay to restart a service on package update. >>>>>>> Disappointing that it seems stale after 2 years...) >>>>>>> >> >>>>>>> >> Sam >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On Mon, 22 Mar 2021 at 12:11, Dan van der Ster < >>>>>>> dan@xxxxxxxxxxxxxx> wrote: >>>>>>> >>> >>>>>>> >>> Hi Sam, >>>>>>> >>> >>>>>>> >>> The daemons restart (for *some* releases) because of this: >>>>>>> >>> https://tracker.ceph.com/issues/21672 >>>>>>> >>> In short, if the selinux module changes, and if you have selinux >>>>>>> >>> enabled, then midway through yum update, there will be a >>>>>>> systemctl >>>>>>> >>> restart ceph.target issued. >>>>>>> >>> >>>>>>> >>> For the rest -- I think you should focus on getting the PGs all >>>>>>> >>> active+clean as soon as possible, because the degraded and >>>>>>> remapped >>>>>>> >>> states are what leads to mon / osdmap growth. >>>>>>> >>> This kind of scenario is why we wrote this tool: >>>>>>> >>> >>>>>>> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py >>>>>>> >>> It will use pg-upmap-items to force the PGs to the OSDs where >>>>>>> they are >>>>>>> >>> currently residing. >>>>>>> >>> >>>>>>> >>> But there is some clarification needed before you go ahead with >>>>>>> that. >>>>>>> >>> Could you share `ceph status`, `ceph health detail`? >>>>>>> >>> >>>>>>> >>> Cheers, Dan >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On Mon, Mar 22, 2021 at 12:05 PM Sam Skipsey <aoanla@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>> > >>>>>>> >>> > Hi everyone: >>>>>>> >>> > >>>>>>> >>> > I posted to the list on Friday morning (UK time), but >>>>>>> apparently my email >>>>>>> >>> > is still in moderation (I have an email from the list bot >>>>>>> telling me that >>>>>>> >>> > it's held for moderation but no updates). >>>>>>> >>> > >>>>>>> >>> > Since this is a bit urgent - we have ~3PB of storage offline - >>>>>>> I'm posting >>>>>>> >>> > again. >>>>>>> >>> > >>>>>>> >>> > To save retyping the whole thing, I will direct you to a copy >>>>>>> of the email >>>>>>> >>> > I wrote on Friday: >>>>>>> >>> > >>>>>>> >>> > http://aoanla.pythonanywhere.com/Logs/EmailToCephUsers.txt >>>>>>> >>> > >>>>>>> >>> > (Since that was sent, we did successfully add big SSDs to the >>>>>>> MON hosts so >>>>>>> >>> > they don't fill up their disks with store.db s). >>>>>>> >>> > >>>>>>> >>> > I would appreciate any advice - assuming this also doesn't get >>>>>>> stuck in >>>>>>> >>> > moderation queues. >>>>>>> >>> > >>>>>>> >>> > -- >>>>>>> >>> > Sam Skipsey (he/him, they/them) >>>>>>> >>> > _______________________________________________ >>>>>>> >>> > ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>> >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -- >>>>>>> >> Sam Skipsey (he/him, they/them) >>>>>>> >> >>>>>>> >> >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > Sam Skipsey (he/him, they/them) >>>>>>> > >>>>>>> > >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sam Skipsey (he/him, they/them) >>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> Sam Skipsey (he/him, they/them) >>>> >>>> >>>> >> >> -- >> Sam Skipsey (he/him, they/them) >> >> >> -- Sam Skipsey (he/him, they/them) _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx