Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

Sam Skipsey <aoanla@xxxxxxxxx> · Mon, 22 Mar 2021 19:35:19 +0000

Hi Dan:

Aha - I think the first commit is probably it - before that commit, the
fact that lo is highest in the interfaces enumeration didn't matter for us
[since it would always be skipped].

This actually almost certainly also is associated with that other site with
a similar problem (OSDs drop out until you restart the network interface),
since I imagine that would reorder the interface list.

Playing with our public and cluster bind address explicitly does seem to
help, so we'll iterate on that and get to a suitable ceph.conf.

Thanks for the help [and it was the network all along]!

Sam

On Mon, 22 Mar 2021 at 19:12, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

> There are two commits between 14.2.16 and 14.2.18 related to loopback
> network. Perhaps one of these is responsible for your issue [1].
>
> I'd try playing with the options like cluster/public bind addr and
> cluster/public bind interface until you can convince the osd to bind to the
> correct listening IP.
>
> (That said, i don't know which version you're running on the logs shared
> earlier. But I think you should try to get 14.2.18 working anyway).
>
> .. dan
>
> [1]
>
> > git log v14.2.18...v14.2.16 ipaddr.cc                   commit
> 89321762ad4cfdd1a68cae467181bdd1a501f14d
> Author: Thomas Goirand <zigo@xxxxxxxxxx>
> Date:   Fri Jan 15 10:50:05 2021 +0100
>
>     common/ipaddr: Allow binding on lo
>
>     Commmit 5cf0fa872231f4eaf8ce6565a04ed675ba5b689b, solves the issue that
>     the osd can't restart after seting a virtual local loopback IP.
> However,
>     this commit also prevents a bgp-to-the-host over unumbered Ipv6
>     local-link is setup, where OSD typically are bound to the lo interface.
>
>     To solve this, this single char patch simply checks against "lo:" to
>     match only virtual interfaces instead of anything that starts with
> "lo".
>
>     Fixes: https://tracker.ceph.com/issues/48893
>     Signed-off-by: Thomas Goirand <zigo@xxxxxxxxxx>
>     (cherry picked from commit 201b59204374ebdab91bb554b986577a97b19c36)
>
> commit b52cae90d67eb878b3ddfe547b8bf16e0d4d1a45
> Author: lijaiwei1 <lijiawei1@xxxxxxxxxxxxxxx>
> Date:   Tue Dec 24 22:34:46 2019 +0800
>
>     common: skip interfaces starting with "lo" in find_ipv{4,6}_in_subnet()
>
>     This will solve the issue that the osd can't restart after seting a
>     virtual local loopback IP.
>     In find_ipv4_in_subnet() and find_ipv6_in_subnet(), I use
>     boost::starts_with(addrs->ifa_name, "lo") to ship the interfaces
>     starting with "lo".
>
>     Fixes: https://tracker.ceph.com/issues/43417
>     Signed-off-by: Jiawei Li <lijiawei1@xxxxxxxxxxxxxxx>
>     (cherry picked from commit 5cf0fa872231f4eaf8ce6565a04ed675ba5b689b)
>
>
>
>
>
> On Mon, Mar 22, 2021, 7:42 PM Sam Skipsey <aoanla@xxxxxxxxx> wrote:
>
>> I don't think we explicitly set any ms settings in the OSD host ceph.conf
>> [all the OSDs ceph.confs are identical across the entire cluster].
>>
>> ip a gives:
>>
>>  ip a
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
>> default qlen 1000
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>     inet 127.0.0.1/8 scope host lo
>>        valid_lft forever preferred_lft forever
>>     inet6 ::1/128 scope host
>>        valid_lft forever preferred_lft forever
>> 2: em1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>> group default qlen 1000
>>     link/ether 4c:d9:8f:55:92:f6 brd ff:ff:ff:ff:ff:ff
>> 3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>> group default qlen 1000
>>     link/ether 4c:d9:8f:55:92:f7 brd ff:ff:ff:ff:ff:ff
>> 4: p2p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>> group default qlen 1000
>>     link/ether b4:96:91:3f:62:20 brd ff:ff:ff:ff:ff:ff
>> 5: p2p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>> group default qlen 1000
>>     link/ether b4:96:91:3f:62:22 brd ff:ff:ff:ff:ff:ff
>>     inet 10.1.50.21/8 brd 10.255.255.255 scope global noprefixroute p2p2
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::b696:91ff:fe3f:6222/64 scope link noprefixroute
>>        valid_lft forever preferred_lft forever
>>
>> (where here p2p2 is the only active network link, and is also the private
>> and public network for the ceph cluster)
>>
>> The output is similar on other hosts - with p2p2 either at position 3 or
>> 5 depending on the order the interfaces were enumerated.
>>
>> Sam
>>
>> On Mon, 22 Mar 2021 at 17:34, Dan van der Ster <dan@xxxxxxxxxxxxxx>
>> wrote:
>>
>>> Which `ms` settings do you have in the OSD host's ceph.conf or the ceph
>>> config dump?
>>>
>>> And how does `ip a` look on one of these hosts where the osd is
>>> registering itself as 127.0.0.1?
>>>
>>>
>>> You might as well set nodown again now. This will make ops pile up, but
>>> that's the least of your concerns at the moment.
>>> (With osds flapping the osdmaps churn and that inflates the mon store)
>>>
>>> .. Dan
>>>
>>> On Mon, Mar 22, 2021, 6:28 PM Sam Skipsey <aoanla@xxxxxxxxx> wrote:
>>>
>>>> Hm, yes it does [and I was wondering why loopbacks were showing up
>>>> suddenly in the logs]. This wasn't happening with 14.2.16 so what's changed
>>>> about how we specify stuff?
>>>>
>>>> This might correlate with the other person on the IRC list who has
>>>> problems with 14.2.18 and their OSDs deciding they don't work sometimes
>>>> until they forcibly restart their network links...
>>>>
>>>>
>>>> Sam
>>>>
>>>> On Mon, 22 Mar 2021 at 17:20, Dan van der Ster <dan@xxxxxxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> What's with the OSDs having loopback addresses? E.g. v2:
>>>>> 127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667
>>>>>
>>>>> Does `ceph osd dump` show those same loopback addresses for each OSD?
>>>>>
>>>>> This sounds familiar... I'm trying to find the recent ticket.
>>>>>
>>>>> .. dan
>>>>>
>>>>>
>>>>> On Mon, Mar 22, 2021, 6:07 PM Sam Skipsey <aoanla@xxxxxxxxx> wrote:
>>>>>
>>>>>> hi Dan:
>>>>>>
>>>>>> So, unsetting nodown results in... almost all of the OSDs being
>>>>>> marked down. (231 down out of 328).
>>>>>> Checking the actual OSD services, most of them were actually up and
>>>>>> active on the nodes, even when the mons had marked them down.
>>>>>> (On a few nodes, the down services corresponded to OSDs that had been
>>>>>> flapping - but increasing osd_max_markdown locally to keep them up despite
>>>>>> the previous flapping, and restarting the services... didn't help.)
>>>>>>
>>>>>> In fact, starting up the few OSD services which had actually stopped,
>>>>>> resulted in a different set of OSDs being marked down, and some others
>>>>>> coming up.
>>>>>> We currently have a sort of "rolling OSD outness" passing through the
>>>>>> cluster - there's always ~230 OSDs marked down now, but which ones those
>>>>>> are changes (we've had everything from 1 HOST down to 4 HOSTS down over the
>>>>>> past 14 minutes as things fluctuate.
>>>>>>
>>>>>> A log from one of the "down" OSDs [which is actually running, and on
>>>>>> the same host as OSDs which are marked up] shows this worrying snippet
>>>>>>
>>>>>> 2021-03-22 17:01:45.298 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:45.298 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:46.340 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:46.340 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:47.376 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:47.376 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:48.395 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:48.395 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:49.407 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:49.407 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:50.400 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:50.400 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:50.922 7f6c9f088700 -1 --2- 10.1.50.21:0/23673 >>
>>>>>> [v2:127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667]
>>>>>> conn(0x56010903e400 0x56011a71fc00 unknown :-1 s=BANNER_CONNECTING pgs=0
>>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2:
>>>>>> 127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667] is using msgr V1
>>>>>> protocol
>>>>>> 2021-03-22 17:01:50.922 7f6c9f889700 -1 --2- 10.1.50.21:0/23673 >>
>>>>>> [v2:127.0.0.1:6821/13015214,v1:127.0.0.1:6831/13015214]
>>>>>> conn(0x5600df434000 0x56011718e000 unknown :-1 s=BANNER_CONNECTING pgs=0
>>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2:
>>>>>> 127.0.0.1:6821/13015214,v1:127.0.0.1:6831/13015214] is using msgr V1
>>>>>> protocol
>>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >>
>>>>>> [v2:127.0.0.1:6826/11091658,v1:127.0.0.1:6828/11091658]
>>>>>> conn(0x5600f85ed800 0x560109df2a00 unknown :-1 s=BANNER_CONNECTING pgs=0
>>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2:
>>>>>> 127.0.0.1:6826/11091658,v1:127.0.0.1:6828/11091658] is using msgr V1
>>>>>> protocol
>>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >>
>>>>>> [v2:127.0.0.1:6859/2683393,v1:127.0.0.1:6862/2683393]
>>>>>> conn(0x5600f22ea000 0x560117182300 unknown :-1 s=BANNER_CONNECTING pgs=0
>>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2:
>>>>>> 127.0.0.1:6859/2683393,v1:127.0.0.1:6862/2683393] is using msgr V1
>>>>>> protocol
>>>>>> 2021-03-22 17:01:50.922 7f6ca008a700 -1 --2- 10.1.50.21:0/23673 >>
>>>>>> [v2:127.0.0.1:6901/15090566,v1:127.0.0.1:6907/15090566]
>>>>>> conn(0x5600df435c00 0x560139370300 unknown :-1 s=BANNER_CONNECTING pgs=0
>>>>>> cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2:
>>>>>> 127.0.0.1:6901/15090566,v1:127.0.0.1:6907/15090566] is using msgr V1
>>>>>> protocol
>>>>>> 2021-03-22 17:01:51.377 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:51.377 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:52.370 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:52.370 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:53.377 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:53.377 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:54.385 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:54.385 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:55.385 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:55.385 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:56.362 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:56.362 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>> 2021-03-22 17:01:57.324 7f6c9c883700  1 osd.127 253515 is_healthy
>>>>>> false -- only 0/10 up peers (less than 33%)
>>>>>> 2021-03-22 17:01:57.324 7f6c9c883700  1 osd.127 253515 not healthy;
>>>>>> waiting to boot
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any suggestions?
>>>>>>
>>>>>> Sam
>>>>>>
>>>>>> P.S. an example ceph status as it is now [with everything now on
>>>>>> 14.2.18, since we had to restart osds anyway]:
>>>>>>
>>>>>>  cluster:
>>>>>>     id:     a1148af2-6eaf-4486-a27e-a05a78c2b378
>>>>>>     health: HEALTH_WARN
>>>>>>             pauserd,pausewr,noout,nobackfill,norebalance flag(s) set
>>>>>>             230 osds down
>>>>>>             4 hosts (80 osds) down
>>>>>>             Reduced data availability: 2048 pgs inactive
>>>>>>             8 slow ops, oldest one blocked for 901 sec, mon.cephs01
>>>>>> has slow ops
>>>>>>
>>>>>>   services:
>>>>>>     mon: 3 daemons, quorum cephs01,cephs02,cephs03 (age 2h)
>>>>>>     mgr: cephs01(active, since 77m)
>>>>>>     osd: 329 osds: 98 up (since 4s), 328 in (since 4d)
>>>>>>          flags pauserd,pausewr,noout,nobackfill,norebalance
>>>>>>
>>>>>>   data:
>>>>>>     pools:   3 pools, 2048 pgs
>>>>>>     objects: 0 objects, 0 B
>>>>>>     usage:   0 B used, 0 B / 0 B avail
>>>>>>     pgs:     100.000% pgs unknown
>>>>>>              2048 unknown
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 22 Mar 2021 at 14:57, Dan van der Ster <dan@xxxxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I would unset nodown (hiding osd failures) and norecover (blcoking
>>>>>>> PGs
>>>>>>> from recovering degraded objects), then start starting osds.
>>>>>>> As soon as you have some osd logs reporting some failures, then
>>>>>>> share those...
>>>>>>>
>>>>>>> - Dan
>>>>>>>
>>>>>>> On Mon, Mar 22, 2021 at 3:49 PM Sam Skipsey <aoanla@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > So, we started the mons and mgr up again, and here's the relevant
>>>>>>> logs, including also ceph versions. We've also turned off all of the
>>>>>>> firewalls on all of the nodes so we know that there can't be network issues
>>>>>>> [and, indeed, all of our management of the OSDs happens via logins from the
>>>>>>> service nodes or to each other]
>>>>>>> >
>>>>>>> > > ceph status
>>>>>>> >
>>>>>>> >
>>>>>>> >   cluster:
>>>>>>> >     id:     a1148af2-6eaf-4486-a27e-a05a78c2b378
>>>>>>> >     health: HEALTH_WARN
>>>>>>> >
>>>>>>>  pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set
>>>>>>> >             1 nearfull osd(s)
>>>>>>> >             3 pool(s) nearfull
>>>>>>> >             Reduced data availability: 2048 pgs inactive
>>>>>>> >             mons cephs01,cephs02,cephs03 are using a lot of disk
>>>>>>> space
>>>>>>> >
>>>>>>> >   services:
>>>>>>> >     mon: 3 daemons, quorum cephs01,cephs02,cephs03 (age 61s)
>>>>>>> >     mgr: cephs01(active, since 76s)
>>>>>>> >     osd: 329 osds: 329 up (since 63s), 328 in (since 4d); 466
>>>>>>> remapped pgs
>>>>>>> >          flags
>>>>>>> pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
>>>>>>> >
>>>>>>> >   data:
>>>>>>> >     pools:   3 pools, 2048 pgs
>>>>>>> >     objects: 0 objects, 0 B
>>>>>>> >     usage:   0 B used, 0 B / 0 B avail
>>>>>>> >     pgs:     100.000% pgs unknown
>>>>>>> >              2048 unknown
>>>>>>> >
>>>>>>> >
>>>>>>> > > ceph health detail
>>>>>>> >
>>>>>>> > HEALTH_WARN
>>>>>>> pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set;
>>>>>>> 1 nearfull osd(s); 3 pool(s) nearfull; Reduced data availability: 2048 pgs
>>>>>>> inactive; mons cephs01,cephs02,cephs03 are using a lot of disk space
>>>>>>> > OSDMAP_FLAGS
>>>>>>> pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set
>>>>>>> > OSD_NEARFULL 1 nearfull osd(s)
>>>>>>> >     osd.63 is near full
>>>>>>> > POOL_NEARFULL 3 pool(s) nearfull
>>>>>>> >     pool 'dteam' is nearfull
>>>>>>> >     pool 'atlas' is nearfull
>>>>>>> >     pool 'atlas-localgroup' is nearfull
>>>>>>> > PG_AVAILABILITY Reduced data availability: 2048 pgs inactive
>>>>>>> >     pg 13.1ef is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f0 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f1 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f2 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f3 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f4 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f5 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f6 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f7 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f8 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1f9 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1fa is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1fb is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1fc is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1fd is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1fe is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 13.1ff is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1ec is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f0 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f1 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f2 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f3 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f4 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f5 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f6 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f7 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f8 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1f9 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1fa is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1fb is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1fc is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1fd is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1fe is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 14.1ff is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1ed is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f0 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f1 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f2 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f3 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f4 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f5 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f6 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f7 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f8 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1f9 is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1fa is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1fb is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1fc is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1fd is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1fe is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> >     pg 15.1ff is stuck inactive for 89.322981, current state
>>>>>>> unknown, last acting []
>>>>>>> > MON_DISK_BIG mons cephs01,cephs02,cephs03 are using a lot of disk
>>>>>>> space
>>>>>>> >     mon.cephs01 is 96 GiB >= mon_data_size_warn (15 GiB)
>>>>>>> >     mon.cephs02 is 96 GiB >= mon_data_size_warn (15 GiB)
>>>>>>> >     mon.cephs03 is 96 GiB >= mon_data_size_warn (15 GiB)
>>>>>>> >
>>>>>>> >
>>>>>>> > > ceph versions
>>>>>>> >
>>>>>>> > {
>>>>>>> >     "mon": {
>>>>>>> >         "ceph version 14.2.18
>>>>>>> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
>>>>>>> >     },
>>>>>>> >     "mgr": {
>>>>>>> >         "ceph version 14.2.18
>>>>>>> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 1
>>>>>>> >     },
>>>>>>> >     "osd": {
>>>>>>> >         "ceph version 14.2.10
>>>>>>> (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)": 1,
>>>>>>> >         "ceph version 14.2.15
>>>>>>> (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 188,
>>>>>>> >         "ceph version 14.2.16
>>>>>>> (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable)": 18,
>>>>>>> >         "ceph version 14.2.18
>>>>>>> (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 122
>>>>>>> >     },
>>>>>>> >
>>>>>>> >
>>>>>>> > >>>>>>
>>>>>>> >
>>>>>>> > As a note, the log where the mgr explodes (which precipitated all
>>>>>>> of this) definitely shows the problem occurring on the 12th [when 14.2.17
>>>>>>> dropped], but things didn't "break" until we tried upgrading OSDs to
>>>>>>> 14.2.18...
>>>>>>> >
>>>>>>> >
>>>>>>> > Sam
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mon, 22 Mar 2021 at 12:20, Sam Skipsey <aoanla@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Hi Dan:
>>>>>>> >>
>>>>>>> >> Thanks for the reply - at present, our mons and mgrs are off
>>>>>>> [because of the unsustainable nature of the filesystem usage]. We'll try
>>>>>>> putting them on again for long enough to get "ceph status" out of them, but
>>>>>>> because the mgr was unable to actually talk to anything, and reply at that
>>>>>>> point.
>>>>>>> >>
>>>>>>> >> (And thanks for the link to the bug tracker - I guess this
>>>>>>> mismatch of expectations is why the devs are so keen to move to
>>>>>>> containerised deployments where there is no co-location of different types
>>>>>>> of server, as it means they don't need to worry as much about the
>>>>>>> assumptions about when it's okay to restart a service on package update.
>>>>>>> Disappointing that it seems stale after 2 years...)
>>>>>>> >>
>>>>>>> >> Sam
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, 22 Mar 2021 at 12:11, Dan van der Ster <
>>>>>>> dan@xxxxxxxxxxxxxx> wrote:
>>>>>>> >>>
>>>>>>> >>> Hi Sam,
>>>>>>> >>>
>>>>>>> >>> The daemons restart (for *some* releases) because of this:
>>>>>>> >>> https://tracker.ceph.com/issues/21672
>>>>>>> >>> In short, if the selinux module changes, and if you have selinux
>>>>>>> >>> enabled, then midway through yum update, there will be a
>>>>>>> systemctl
>>>>>>> >>> restart ceph.target issued.
>>>>>>> >>>
>>>>>>> >>> For the rest -- I think you should focus on getting the PGs all
>>>>>>> >>> active+clean as soon as possible, because the degraded and
>>>>>>> remapped
>>>>>>> >>> states are what leads to mon / osdmap growth.
>>>>>>> >>> This kind of scenario is why we wrote this tool:
>>>>>>> >>>
>>>>>>> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
>>>>>>> >>> It will use pg-upmap-items to force the PGs to the OSDs where
>>>>>>> they are
>>>>>>> >>> currently residing.
>>>>>>> >>>
>>>>>>> >>> But there is some clarification needed before you go ahead with
>>>>>>> that.
>>>>>>> >>> Could you share `ceph status`, `ceph health detail`?
>>>>>>> >>>
>>>>>>> >>> Cheers, Dan
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Mon, Mar 22, 2021 at 12:05 PM Sam Skipsey <aoanla@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>> >>> >
>>>>>>> >>> > Hi everyone:
>>>>>>> >>> >
>>>>>>> >>> > I posted to the list on Friday morning (UK time), but
>>>>>>> apparently my email
>>>>>>> >>> > is still in moderation (I have an email from the list bot
>>>>>>> telling me that
>>>>>>> >>> > it's held for moderation but no updates).
>>>>>>> >>> >
>>>>>>> >>> > Since this is a bit urgent - we have ~3PB of storage offline -
>>>>>>> I'm posting
>>>>>>> >>> > again.
>>>>>>> >>> >
>>>>>>> >>> > To save retyping the whole thing, I will direct you to a copy
>>>>>>> of the email
>>>>>>> >>> > I wrote on Friday:
>>>>>>> >>> >
>>>>>>> >>> > http://aoanla.pythonanywhere.com/Logs/EmailToCephUsers.txt
>>>>>>> >>> >
>>>>>>> >>> > (Since that was sent, we did successfully add big SSDs to the
>>>>>>> MON hosts so
>>>>>>> >>> > they don't fill up their disks with store.db s).
>>>>>>> >>> >
>>>>>>> >>> > I would appreciate any advice - assuming this also doesn't get
>>>>>>> stuck in
>>>>>>> >>> > moderation queues.
>>>>>>> >>> >
>>>>>>> >>> > --
>>>>>>> >>> > Sam Skipsey (he/him, they/them)
>>>>>>> >>> > _______________________________________________
>>>>>>> >>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>> >>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Sam Skipsey (he/him, they/them)
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Sam Skipsey (he/him, they/them)
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sam Skipsey (he/him, they/them)
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Sam Skipsey (he/him, they/them)
>>>>
>>>>
>>>>
>>
>> --
>> Sam Skipsey (he/him, they/them)
>>
>>
>>

-- 
Sam Skipsey (he/him, they/them)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx