Re: BADAUTHORIZER in Nautilus, unknown PGs, slow peering, very slow client I/O

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Tue, 13 Apr 2021 14:24:28 +0200

Follow up myself, some notes, what helped:

- Finding OSDs with excessive bad authorizer logs, killing them, restarting them

In many cases this cleared the unknown PGs and restored to more normal
I/O. However some OSDs continued with a high amount of messages for some
more hours even after restarting.

It looks like the message indicated osd<->osd communication errors, so
it was not always clear, which side of the osds was failing, so in the
end we restarted both osds that were listed in the log message.

- It might be related to a specific monitor

At some point the monitor on server9 crashed and shortly after I/O from
clients began to behave normal again. This is however time wise also
near the restart of all osds with high number of bad authorizer
messages (which we did before the monitor crashed).

At the moment the recovery is running normal (~150-300 MB/s) and client
I/O is around 500-700MB/s. However during the problematic time, client
I/O was around 0 Byte/s to some kilobyte/s. At the same time recovery
speed was high (1-1.5 GB/s), even though osd-max-backfills and
osd-recovery-max-active are set to 1 by default.

So the cluster was able to recover at high speed, but it was not able to
deliver client I/O at any acceptable speed. Are there any other tunables
that we should set to avoid this problem? We also set osd recovery op
priority = 2 on all nodes by default.

- Pool (in-)dependence?

One of the strangest things is that the I/O of the SSD pool was also
affected, even though only OSDs in the HDD pool were added.

Previously I reported OSDs that peer with OSDs of a different class on
this mailing list, even though the crush rules limit replication to the
same class.

- Log sizes

With standard osd logging (no settings) the cluster produced around 50
GB logs within 8h of time, with ~10 osds/node, which had the side effect
of quickly filling up the log partition. Main log messages was the bad
authorizer message + no secret key found.

- Octopus to the rescue?

Privately I got the hint that some of the problems might be solved in
Octopus and I was wondering if anyone can confirm or decline that?

Best regards,

Nico

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> writes:

> Update, posting information from other posts before:
>
> [08:09:40] server3.place6:~# ceph config-key dump | grep config/
>     "config/global/auth_client_required": "cephx",
>     "config/global/auth_cluster_required": "cephx",
>     "config/global/auth_service_required": "cephx",
>     "config/global/cluster_network": "2a0a:e5c0:2:1::/64",
>     "config/global/ms_bind_ipv4": "false",
>     "config/global/ms_bind_ipv6": "true",
>     "config/global/osd_class_update_on_start": "false",
>     "config/global/osd_pool_default_size": "3",
>     "config/global/public_network": "2a0a:e5c0:2:1::/64",
>     "config/mgr/mgr/balancer/active": "1",
>     "config/mgr/mgr/balancer/max_misplaced": ".01",
>     "config/mgr/mgr/balancer/mode": "upmap",
>     "config/mgr/mgr/prometheus/rbd_stats_pools": "hdd,ssd,xruk-ssd-pool",
>     "config/osd/osd_max_backfills": "1",
>     "config/osd/osd_recovery_max_active": "1",
>     "config/osd/osd_recovery_op_priority": "1",
> [08:09:56] server3.place6:~#
>
> To some degree finding osds with huge logs sizes that usually contain
>
> 2021-04-13 08:31:25.388 7fa4896a4700  0 --1- [2a0a:e5c0:2:1:21b:21ff:febb:68d8]:0/19492 >> v1:[2a0a:e5c0:2:1:21b:21ff:fe85:a3a2]:6881/29606 conn(0x55a04deadf80 0x55a0273ad000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
> 2021-04-13 08:31:25.388 7fa48a6b7700  0 --1- [2a0a:e5c0:2:1:21b:21ff:febb:68d8]:0/19492 >> v1:[2a0a:e5c0:2:1:21b:21ff:fe85:a3a2]:6879/29606 conn(0x55a04eb16d80 0x55a02742c000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
>
> (with varying osds IPs) and restarting them seems to improve the
> situation for a bit, but even newly restarted osds go back into the
> BADAUTHORIZER state.
>
> Nico Schottelius <nico.schottelius@xxxxxxxxxxx> writes:
>
>> Good morning,
>>
>> I've look somewhat intensively through the list and it seems we are
>> rather hard hit by this. Originally yesterday started on a mixed 14.2.9
>> and 14.2.16 cluster (osds, mons were all 14.2.16).
>>
>> We started phasing in 7 new osds, 6 of them throttled by reweighting to
>> 0.1.
>> Symptoms are many unknown PGs, very long stuck in peering (hours), slow
>> activating and the infamous BADAUTHORIZER message. Client I/O is almost
>> 0, not only of the pool with new OSDs, but also of other pools.
>>
>> We tried restarting all OSDs one by one, which seemed to clear the
>> unknown PGs, however after around 1h they came back to unknown state.
>>
>> Tuning the --osd-max-backfills=.. and  --osd-recovery-max-active=... to
>> 1 does not improve the situation, so we are currently running at =7 for
>> both of them.
>>
>> An excerpt from one of the osds which heavily logs this issue is below.
>>
>> Any pointer on how this problem was solved on nautilus before is much
>> appreciated, as the issue started late last night.
>>
>> Best regards,
>>
>> Nico
>>
>>
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: (Original Log Time 2021/04/13-07:38:37.275255) EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "compaction_finished", "compaction_time_micros": 5288403, "compaction_time_cpu_micros": 2940370, "output_level": 2, "num_output_files": 6, "total_output_size": 370673432, "num_input_records": 983055, "num_output_records": 641964, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 5, 35, 0, 0, 0, 0]}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 421257}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401289}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401288}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401278}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401277}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401258}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401257}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401256}
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: [db/compaction_job.cc:1645] [default] [JOB 6] Compacting 1@1 + 5@2 files to L2, score 1.17
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: [db/compaction_job.cc:1649] [default] Compaction start summary: Base version 5 Base level 1, inputs: [421255(65MB)], [401318(65MB) 401319(65MB) 401320(65MB) 401321(65MB) 401322(65MB)]
>>
>> 2021-04-13 07:38:37.275 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 6, "event": "compaction_started", "compaction_reason": "LevelMaxLevelSize", "files_L1": [421255], "files_L2": [401318, 401319, 401320, 401321, 401322], "score": 1.17224, "input_data_size": 413630461}
>> 2021-04-13 07:38:37.307 7f15a0704700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6862/8751 conn(0x55d520fbcd80 0x55d520ffa000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:37.315 7f15a372c700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6835/13651 conn(0x55d521b1bf80 0x55d521b19800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:37.707 7f15a0704700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6862/8751 conn(0x55d520fbcd80 0x55d520ffa000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:37.715 7f15a372c700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6835/13651 conn(0x55d521b1bf80 0x55d521b19800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:38.119 7f159a469700  4 rocksdb: [db/compaction_job.cc:1332] [default] [JOB 6] Generated table #421271: 130853 keys, 68923910 bytes
>> 2021-04-13 07:38:38.119 7f159a469700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292318119320, "cf_name": "default", "job": 6, "event": "table_file_creation", "file_number": 421271, "file_size": 68923910, "table_properties": {"data_size": 67111841, "index_size": 1483972, "filter_size": 327237, "raw_key_size": 10909111, "raw_average_key_size": 83, "raw_value_size": 62457370, "raw_average_value_size": 477, "num_data_blocks": 16807, "num_entries": 130853, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}}
>> 2021-04-13 07:38:38.331 7f15a0704700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6884/29245 conn(0x55d521b1a000 0x55d521b16800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:38.331 7f15a0704700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6819/17174 conn(0x55d521b1c400 0x55d521b38000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:38.507 7f15a0704700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6862/8751 conn(0x55d520fbcd80 0x55d520ffa000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:38.515 7f15a372c700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6835/13651 conn(0x55d521b1bf80 0x55d521b19800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:38.731 7f15a0704700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6884/29245 conn(0x55d521b1a000 0x55d521b16800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER
>> 2021-04-13 07:38:38.731 7f15a0704700  0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6819/17174 conn(0x55d521b1c400 0x55d521b38000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER

--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx