Nautilus - osdmap not trimming

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

We have ceph cluster running on Nautilus, recently upgraded from Mimic.
When in Mimic we noticed issue with osdmap not trimming, which caused part of our cluster to crash due to osdmap cache misses. We solved it by adding "osd_map_cache_size = 5000" to our ceph.conf Because we had at that time mixed OSD versions from both Mimic and Nautilus we decided to finish upgrade, but it didn't solve our problem. We have at the moment: "oldest_map": 67114, "newest_map": 72588,and the difference is not shrinking even thought cluster is in active+clean state. Restarting all mon's didn't help. It seems bug is similar to https://tracker.ceph.com/issues/44184 but there's no solution there.
What else can i check or do?
I don't want do to cangerous things like mon_osd_force_trim_to or something similar without finding the cause.

I noticed in MON debug log:

2020-11-10 17:11:14.612 7f9592d5b700 10 mon.monb01@0(leader).osd e72571 should_prune could only prune 4957 epochs (67114..72071), which is less than the required minimum (10000) 2020-11-10 17:11:19.612 7f9592d5b700 10 mon.monb01@0(leader).osd e72571 should_prune could only prune 4957 epochs (67114..72071), which is less than the required minimum (10000)

So i added config options to reduce those values:

  mon       dev      mon_debug_block_osdmap_trim       false
  mon       advanced mon_min_osdmap_epochs             100
  mon       advanced mon_osdmap_full_prune_min         500
  mon       advanced paxos_service_trim_min            10

But it didn't help:

2020-11-10 18:28:26.165 7f1b700ab700 20 mon.monb01@0(leader).osd e72588 load_osdmap_manifest osdmap manifest detected in store; reload. 2020-11-10 18:28:26.169 7f1b700ab700 10 mon.monb01@0(leader).osd e72588 load_osdmap_manifest store osdmap manifest pinned (67114 .. 72484) 2020-11-10 18:28:26.169 7f1b700ab700 10 mon.monb01@0(leader).osd e72588 should_prune not enough epochs to form an interval (last pinned: 72484, last to pin: 72488, interval: 10)

Command "ceph report | jq '.osdmap_manifest' |jq '.pinned_maps[]'" shows 67114 on the top, but i'm unable to determine why.

Same with 'ceph report | jq .osdmap_first_committed':

root@monb01:/var/log/ceph# ceph report | jq .osdmap_first_committed
report 4073203295
67114
root@monb01:/var/log/ceph#

When i try to derermine if a certain PG or OSD is keeping it so low i don't get anything.

And in MON debug log i get:

2020-11-10 18:42:41.767 7f1b74721700 10 mon.monb01@0(leader) e6 refresh_from_paxos 2020-11-10 18:42:41.767 7f1b74721700 10 mon.monb01@0(leader).paxosservice(mdsmap 1..1) refresh 2020-11-10 18:42:41.767 7f1b74721700 10 mon.monb01@0(leader).paxosservice(osdmap 67114..72588) refresh 2020-11-10 18:42:41.767 7f1b74721700 20 mon.monb01@0(leader).osd e72588 load_osdmap_manifest osdmap manifest detected in store; reload. 2020-11-10 18:42:41.767 7f1b74721700 10 mon.monb01@0(leader).osd e72588 load_osdmap_manifest store osdmap manifest pinned (67114 .. 72484)

I also get:

root@monb01:/var/log/ceph#  ceph report |grep "min_last_epoch_clean"
report 2716976759
        "min_last_epoch_clean": 0,
root@monb01:/var/log/ceph#


Additional info:
root@monb01:/var/log/ceph# ceph versions
{
    "mon": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)": 3
    },
    "mgr": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)": 3
    },
    "osd": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)": 120, "ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 164
    },
    "mds": {},
    "overall": {
"ceph version 14.2.13 (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)": 126, "ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 164
    }
}


root@monb01:/var/log/ceph# ceph mon feature ls

all features
        supported: [kraken,luminous,mimic,osdmap-prune,nautilus]
        persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
on current monmap (epoch 6)
        persistent: [kraken,luminous,mimic,osdmap-prune,nautilus]
        required: [kraken,luminous,mimic,osdmap-prune,nautilus]


root@monb01:/var/log/ceph# ceph osd dump | grep require
require_min_compat_client luminous
require_osd_release nautilus


root@monb01:/var/log/ceph# ceph report | jq '.osdmap_manifest.pinned_maps | length'
report 1777129876
538

root@monb01:/var/log/ceph# ceph pg dump -f json | jq .osd_epochs
dumped all
null

--
Best regards
Marcin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux