Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What should go to the SSDs? We have not enough Slots to have a 3/1 ratio
for block.db. Most of the block.db SSDs served 10 OSDs and were mostly
idling so we are now removing them as we haven't seen any benefits. (But
maybe I am just blind and ignorant and just not see it)

Am Mi., 23. März 2022 um 15:17 Uhr schrieb Igor Fedotov <
igor.fedotov@xxxxxxxx>:

> Unfortunately there is no silver bullet here so far. Just one note after
> looking at your configuration - I would strongly encourage you to add SSD
> disks for spinner-only OSDs.
>
> Particularly when they are used for s3 payload which is pretty DB
> intensive.
>
>
> Thanks,
>
> Igor
> On 3/23/2022 5:03 PM, Boris Behrens wrote:
>
> Hi Igor,
> yes, I've compacted them all.
>
> So is there a solution for the problem, because I can imagine this happens
> when we remove large files from s3 (we use it as backup storage for lz4
> compressed rbd exports).
> Maybe I missed it.
>
> Cheers
>  Boris
>
> Am Mi., 23. März 2022 um 13:43 Uhr schrieb Igor Fedotov <
> igor.fedotov@xxxxxxxx>:
>
>> Hi Boris,
>>
>> Curious if you tried to compact RocksdDB for all your OSDs? Sorry I this
>> has been already discussed, haven't read through all the thread...
>>
>>  From my experience the symptoms you're facing are pretty common for DB
>> performance degradation caused by bulk data removal. In that case OSDs
>> start to flap due to suicide timeout as some regular user ops take ages
>> to complete.
>>
>> The issue has been discussed in this list multiple times.
>>
>> Thanks,
>>
>> Igor
>>
>> On 3/8/2022 12:36 AM, Boris Behrens wrote:
>> > Hi,
>> >
>> > we've had the problem with OSDs marked as offline since we updated to
>> > octopus and hope the problem would be fixed with the latest patch. We
>> have
>> > this kind of problem only with octopus and there only with the big s3
>> > cluster.
>> > * Hosts are all Ubuntu 20,04 and we've set the txqueuelen to 10k
>> > * Network interfaces are 20gbit (2x10 in a 802.3ad encap3+4 bond)
>> > * We only use the frontend network.
>> > * All disks are spinning, some have block.db devices.
>> > * All disks are bluestore
>> > * configs are mostly defaults
>> > * we've set the OSDs to restart=always without a limit, because we had
>> the
>> > problem with unavailable PGs when two OSDs are marked as offline and the
>> > share PGs.
>> >
>> > But since we installed the latest patch we are experiencing more OSD
>> downs
>> > and even crashes.
>> > I tried to remove as much duplicated lines as possible.
>> >
>> > Is the numa error a problem?
>> > Why do OSD daemons not respond to hearthbeats? I mean even when the
>> disk is
>> > totally loaded with IO, the system itself should answer heathbeats, or
>> am I
>> > missing something?
>> >
>> > I really hope some of you could send me on the correct way to solve this
>> > nasty problem.
>> >
>> > This is how the latest crash looks like
>> > Mar 07 17:44:15 s3db18 ceph-osd[4530]: 2022-03-07T17:44:15.099+0000
>> > 7f5f05d2a700 -1 osd.161 489755 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > ...
>> > Mar 07 17:49:07 s3db18 ceph-osd[4530]: 2022-03-07T17:49:07.678+0000
>> > 7f5f05d2a700 -1 osd.161 489774 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]: *** Caught signal (Aborted) **
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  in thread 7f5ef1501700
>> > thread_name:tp_osd_tp
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  ceph version 15.2.16
>> > (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  1: (()+0x143c0) [0x7f5f0d4623c0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  2: (pthread_kill()+0x38)
>> > [0x7f5f0d45ef08]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  3:
>> > (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>> const*,
>> > unsigned long)+0x471) [0x55a699a01201]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  4:
>> > (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, unsigned
>> > long, unsigned long)+0x8e) [0x55a699a0199e]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  5:
>> > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3f0)
>> > [0x55a699a224b0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  6:
>> > (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55a699a252c4]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  7: (()+0x8609) [0x7f5f0d456609]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  8: (clone()+0x43)
>> [0x7f5f0cfc0163]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]: 2022-03-07T17:53:07.387+0000
>> > 7f5ef1501700 -1 *** Caught signal (Aborted) **
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  in thread 7f5ef1501700
>> > thread_name:tp_osd_tp
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  ceph version 15.2.16
>> > (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  1: (()+0x143c0) [0x7f5f0d4623c0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  2: (pthread_kill()+0x38)
>> > [0x7f5f0d45ef08]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  3:
>> > (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>> const*,
>> > unsigned long)+0x471) [0x55a699a01201]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  4:
>> > (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, unsigned
>> > long, unsigned long)+0x8e) [0x55a699a0199e]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  5:
>> > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3f0)
>> > [0x55a699a224b0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  6:
>> > (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55a699a252c4]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  7: (()+0x8609) [0x7f5f0d456609]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  8: (clone()+0x43)
>> [0x7f5f0cfc0163]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  NOTE: a copy of the executable,
>> or
>> > `objdump -rdS <executable>` is needed to interpret this.
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  -5246>
>> 2022-03-07T17:49:07.678+0000
>> > 7f5f05d2a700 -1 osd.161 489774 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:      0>
>> 2022-03-07T17:53:07.387+0000
>> > 7f5ef1501700 -1 *** Caught signal (Aborted) **
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  in thread 7f5ef1501700
>> > thread_name:tp_osd_tp
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  ceph version 15.2.16
>> > (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  1: (()+0x143c0) [0x7f5f0d4623c0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  2: (pthread_kill()+0x38)
>> > [0x7f5f0d45ef08]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  3:
>> > (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>> const*,
>> > unsigned long)+0x471) [0x55a699a01201]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  4:
>> > (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, unsigned
>> > long, unsigned long)+0x8e) [0x55a699a0199e]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  5:
>> > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3f0)
>> > [0x55a699a224b0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  6:
>> > (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55a699a252c4]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  7: (()+0x8609) [0x7f5f0d456609]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  8: (clone()+0x43)
>> [0x7f5f0cfc0163]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  NOTE: a copy of the executable,
>> or
>> > `objdump -rdS <executable>` is needed to interpret this.
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  -5246>
>> 2022-03-07T17:49:07.678+0000
>> > 7f5f05d2a700 -1 osd.161 489774 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:      0>
>> 2022-03-07T17:53:07.387+0000
>> > 7f5ef1501700 -1 *** Caught signal (Aborted) **
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  in thread 7f5ef1501700
>> > thread_name:tp_osd_tp
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  ceph version 15.2.16
>> > (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  1: (()+0x143c0) [0x7f5f0d4623c0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  2: (pthread_kill()+0x38)
>> > [0x7f5f0d45ef08]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  3:
>> > (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>> const*,
>> > unsigned long)+0x471) [0x55a699a01201]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  4:
>> > (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, unsigned
>> > long, unsigned long)+0x8e) [0x55a699a0199e]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  5:
>> > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3f0)
>> > [0x55a699a224b0]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  6:
>> > (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55a699a252c4]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  7: (()+0x8609) [0x7f5f0d456609]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  8: (clone()+0x43)
>> [0x7f5f0cfc0163]
>> > Mar 07 17:53:07 s3db18 ceph-osd[4530]:  NOTE: a copy of the executable,
>> or
>> > `objdump -rdS <executable>` is needed to interpret this.
>> > Mar 07 17:53:09 s3db18 systemd[1]: ceph-osd@161.service: Main process
>> > exited, code=killed, status=6/ABRT
>> > Mar 07 17:53:09 s3db18 systemd[1]: ceph-osd@161.service: Failed with
>> result
>> > 'signal'.
>> > Mar 07 17:53:19 s3db18 systemd[1]: ceph-osd@161.service: Scheduled
>> restart
>> > job, restart counter is at 1.
>> > Mar 07 17:53:19 s3db18 systemd[1]: Stopped Ceph object storage daemon
>> > osd.161.
>> > Mar 07 17:53:19 s3db18 systemd[1]: Starting Ceph object storage daemon
>> > osd.161...
>> > Mar 07 17:53:19 s3db18 systemd[1]: Started Ceph object storage daemon
>> > osd.161.
>> > Mar 07 17:53:20 s3db18 ceph-osd[4009440]: 2022-03-07T17:53:20.498+0000
>> > 7f9617781d80 -1 Falling back to public interface
>> > Mar 07 17:53:33 s3db18 ceph-osd[4009440]: 2022-03-07T17:53:33.906+0000
>> > 7f9617781d80 -1 osd.161 489778 log_to_monitors {default=true}
>> > Mar 07 17:53:34 s3db18 ceph-osd[4009440]: 2022-03-07T17:53:34.206+0000
>> > 7f96106f2700 -1 osd.161 489778 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > ...
>> > Mar 07 18:58:12 s3db18 ceph-osd[4009440]: 2022-03-07T18:58:12.717+0000
>> > 7f96106f2700 -1 osd.161 489880 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> >
>> > And this is how an it looks like when OSDs get marked as out:
>> > Mar 03 19:29:04 s3db13 ceph-osd[5792]: 2022-03-03T19:29:04.857+0000
>> > 7f16115e0700 -1 osd.97 485814 heartbeat_check: no reply from
>> > [XX:22::65]:6886 osd.124 since back 2022-03-03T19:28:41.250692+0000
>> front
>> > 2022-03-03T19:28:41.250649+0000 (oldest deadline
>> > 2022-03-03T19:29:04.150352+0000)
>> > ...130 time...
>> > Mar 03 21:55:37 s3db13 ceph-osd[5792]: 2022-03-03T21:55:37.844+0000
>> > 7f16115e0700 -1 osd.97 486383 heartbeat_check: no reply from
>> > [XX:22::65]:6941 osd.124 since back 2022-03-03T21:55:12.514627+0000
>> front
>> > 2022-03-03T21:55:12.514649+0000 (oldest deadline
>> > 2022-03-03T21:55:36.613469+0000)
>> > Mar 04 00:00:05 s3db13 ceph-osd[5792]: 2022-03-04T00:00:05.035+0000
>> > 7f1613080700 -1 received  signal: Hangup from killall -q -1 ceph-mon
>> > ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror  (PID: 1385079)
>> > UID: 0
>> > Mar 04 00:00:05 s3db13 ceph-osd[5792]: 2022-03-04T00:00:05.047+0000
>> > 7f1613080700 -1 received  signal: Hangup from  (PID: 1385080) UID: 0
>> > Mar 04 00:06:00 s3db13 sudo[1389262]:     ceph : TTY=unknown ; PWD=/ ;
>> > USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sde
>> > Mar 04 00:06:00 s3db13 sudo[1389262]: pam_unix(sudo:session): session
>> > opened for user root by (uid=0)
>> > Mar 04 00:06:00 s3db13 sudo[1389262]: pam_unix(sudo:session): session
>> > closed for user root
>> > Mar 04 00:06:01 s3db13 sudo[1389287]:     ceph : TTY=unknown ; PWD=/ ;
>> > USER=root ; COMMAND=/usr/sbin/nvme ata smart-log-add --json /dev/sde
>> > Mar 04 00:06:01 s3db13 sudo[1389287]: pam_unix(sudo:session): session
>> > opened for user root by (uid=0)
>> > Mar 04 00:06:01 s3db13 sudo[1389287]: pam_unix(sudo:session): session
>> > closed for user root
>> > Mar 05 00:00:10 s3db13 ceph-osd[5792]: 2022-03-05T00:00:10.213+0000
>> > 7f1613080700 -1 received  signal: Hangup from killall -q -1 ceph-mon
>> > ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror  (PID: 2406262)
>> > UID: 0
>> > Mar 05 00:00:10 s3db13 ceph-osd[5792]: 2022-03-05T00:00:10.237+0000
>> > 7f1613080700 -1 received  signal: Hangup from  (PID: 2406263) UID: 0
>> > Mar 05 00:08:03 s3db13 sudo[2411721]:     ceph : TTY=unknown ; PWD=/ ;
>> > USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sde
>> > Mar 05 00:08:03 s3db13 sudo[2411721]: pam_unix(sudo:session): session
>> > opened for user root by (uid=0)
>> > Mar 05 00:08:04 s3db13 sudo[2411721]: pam_unix(sudo:session): session
>> > closed for user root
>> > Mar 05 00:08:04 s3db13 sudo[2411725]:     ceph : TTY=unknown ; PWD=/ ;
>> > USER=root ; COMMAND=/usr/sbin/nvme ata smart-log-add --json /dev/sde
>> > Mar 05 00:08:04 s3db13 sudo[2411725]: pam_unix(sudo:session): session
>> > opened for user root by (uid=0)
>> > Mar 05 00:08:04 s3db13 sudo[2411725]: pam_unix(sudo:session): session
>> > closed for user root
>> > Mar 05 19:19:49 s3db13 ceph-osd[5792]: 2022-03-05T19:19:49.189+0000
>> > 7f160fddd700 -1 osd.97 486852 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 05 19:21:18 s3db13 ceph-osd[5792]: 2022-03-05T19:21:18.377+0000
>> > 7f160fddd700 -1 osd.97 486858 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 05 19:21:45 s3db13 ceph-osd[5792]: 2022-03-05T19:21:45.304+0000
>> > 7f16115e0700 -1 osd.97 486863 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 since back 2022-03-05T19:21:21.762744+0000
>> front
>> > 2022-03-05T19:21:21.762723+0000 (oldest deadline
>> > 2022-03-05T19:21:45.261347+0000)
>> > Mar 05 19:21:46 s3db13 ceph-osd[5792]: 2022-03-05T19:21:46.260+0000
>> > 7f16115e0700 -1 osd.97 486863 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 since back 2022-03-05T19:21:21.762744+0000
>> front
>> > 2022-03-05T19:21:21.762723+0000 (oldest deadline
>> > 2022-03-05T19:21:45.261347+0000)
>> > Mar 05 19:21:47 s3db13 ceph-osd[5792]: 2022-03-05T19:21:47.252+0000
>> > 7f16115e0700 -1 osd.97 486863 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 since back 2022-03-05T19:21:21.762744+0000
>> front
>> > 2022-03-05T19:21:21.762723+0000 (oldest deadline
>> > 2022-03-05T19:21:45.261347+0000)
>> > Mar 05 19:22:59 s3db13 ceph-osd[5792]: 2022-03-05T19:22:59.636+0000
>> > 7f160fddd700 -1 osd.97 486869 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 05 19:23:33 s3db13 ceph-osd[5792]: 2022-03-05T19:23:33.439+0000
>> > 7f16115e0700 -1 osd.97 486872 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:23:34 s3db13 ceph-osd[5792]: 2022-03-05T19:23:34.458+0000
>> > 7f16115e0700 -1 osd.97 486872 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:23:35 s3db13 ceph-osd[5792]: 2022-03-05T19:23:35.434+0000
>> > 7f16115e0700 -1 osd.97 486872 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 since back 2022-03-05T19:23:09.928097+0000
>> front
>> > 2022-03-05T19:23:09.928150+0000 (oldest deadline
>> > 2022-03-05T19:23:35.227545+0000)
>> > ...
>> > Mar 05 19:23:48 s3db13 ceph-osd[5792]: 2022-03-05T19:23:48.386+0000
>> > 7f16115e0700 -1 osd.97 486872 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:23:49 s3db13 ceph-osd[5792]: 2022-03-05T19:23:49.362+0000
>> > 7f16115e0700 -1 osd.97 486872 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 since back 2022-03-05T19:23:09.928097+0000
>> front
>> > 2022-03-05T19:23:09.928150+0000 (oldest deadline
>> > 2022-03-05T19:23:35.227545+0000)
>> > Mar 05 19:23:49 s3db13 ceph-osd[5792]: 2022-03-05T19:23:49.362+0000
>> > 7f16115e0700 -1 osd.97 486872 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:23:50 s3db13 ceph-osd[5792]: 2022-03-05T19:23:50.358+0000
>> > 7f16115e0700 -1 osd.97 486873 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:23:51 s3db13 ceph-osd[5792]: 2022-03-05T19:23:51.330+0000
>> > 7f16115e0700 -1 osd.97 486874 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4:b0b12ee9:::gc.22:head
>> > [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=9
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:23:52 s3db13 ceph-osd[5792]: 2022-03-05T19:23:52.326+0000
>> > 7f16115e0700 -1 osd.97 486874 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4:b0b12ee9:::gc.22:head
>> > [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=9
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:23:53 s3db13 ceph-osd[5792]: 2022-03-05T19:23:53.338+0000
>> > 7f16115e0700 -1 osd.97 486874 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4:b0b12ee9:::gc.22:head
>> > [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=9
>> > ondisk+retry+read+known_if_redirected e486872)
>> > Mar 05 19:25:02 s3db13 ceph-osd[5792]: 2022-03-05T19:25:02.342+0000
>> > 7f160fddd700 -1 osd.97 486878 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 05 19:25:33 s3db13 ceph-osd[5792]: 2022-03-05T19:25:33.569+0000
>> > 7f16115e0700 -1 osd.97 486880 get_health_metrics reporting 2 slow ops,
>> > oldest is osd_op(client.2304224857.0:4271104 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486879)
>> > ...
>> > Mar 05 19:25:44 s3db13 ceph-osd[5792]: 2022-03-05T19:25:44.476+0000
>> > 7f16115e0700 -1 osd.97 486880 get_health_metrics reporting 3 slow ops,
>> > oldest is osd_op(client.2304224857.0:4271104 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486879)
>> > Mar 05 19:25:45 s3db13 ceph-osd[5792]: 2022-03-05T19:25:45.456+0000
>> > 7f16115e0700 -1 osd.97 486880 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 ever on either front or back, first ping sent
>> > 2022-03-05T19:25:25.281582+0000 (oldest deadline
>> > 2022-03-05T19:25:45.281582+0000)
>> > Mar 05 19:25:45 s3db13 ceph-osd[5792]: 2022-03-05T19:25:45.456+0000
>> > 7f16115e0700 -1 osd.97 486880 get_health_metrics reporting 3 slow ops,
>> > oldest is osd_op(client.2304224857.0:4271104 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486879)
>> > ...
>> > Mar 05 19:26:08 s3db13 ceph-osd[5792]: 2022-03-05T19:26:08.363+0000
>> > 7f16115e0700 -1 osd.97 486880 get_health_metrics reporting 3 slow ops,
>> > oldest is osd_op(client.2304224857.0:4271104 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486879)
>> > Mar 05 19:26:09 s3db13 ceph-osd[5792]: 2022-03-05T19:26:09.371+0000
>> > 7f16115e0700 -1 osd.97 486880 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 ever on either front or back, first ping sent
>> > 2022-03-05T19:25:25.281582+0000 (oldest deadline
>> > 2022-03-05T19:25:45.281582+0000)
>> > Mar 05 19:26:09 s3db13 ceph-osd[5792]: 2022-03-05T19:26:09.375+0000
>> > 7f16115e0700 -1 osd.97 486880 get_health_metrics reporting 3 slow ops,
>> > oldest is osd_op(client.2304224857.0:4271104 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486879)
>> > Mar 05 19:26:10 s3db13 ceph-osd[5792]: 2022-03-05T19:26:10.383+0000
>> > 7f16115e0700 -1 osd.97 486881 get_health_metrics reporting 3 slow ops,
>> > oldest is osd_op(client.2304224857.0:4271104 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486879)
>> > Mar 05 19:26:11 s3db13 ceph-osd[5792]: 2022-03-05T19:26:11.407+0000
>> > 7f16115e0700 -1 osd.97 486882 get_health_metrics reporting 1 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4:b0b12ee9:::gc.22:head
>> > [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=11
>> > ondisk+retry+read+known_if_redirected e486879)
>> > Mar 05 19:26:12 s3db13 ceph-osd[5792]: 2022-03-05T19:26:12.399+0000
>> > 7f16115e0700 -1 osd.97 486882 get_health_metrics reporting 1 slow ops,
>> > oldest is osd_op(client.2304224848.0:3139913 4.d 4:b0b12ee9:::gc.22:head
>> > [call rgw_gc.rgw_gc_queue_list_entries in=46b] snapc 0=[] RETRY=11
>> > ondisk+retry+read+known_if_redirected e486879)
>> > Mar 05 19:27:24 s3db13 ceph-osd[5792]: 2022-03-05T19:27:24.975+0000
>> > 7f160fddd700 -1 osd.97 486887 set_numa_affinity unable to identify
>> public
>> > interface '' numa node: (2) No such file or directory
>> > Mar 05 19:27:58 s3db13 ceph-osd[5792]: 2022-03-05T19:27:58.114+0000
>> > 7f16115e0700 -1 osd.97 486890 get_health_metrics reporting 4 slow ops,
>> > oldest is osd_op(client.2304235452.0:811825 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486889)
>> > ...
>> > Mar 05 19:28:08 s3db13 ceph-osd[5792]: 2022-03-05T19:28:08.137+0000
>> > 7f16115e0700 -1 osd.97 486890 get_health_metrics reporting 4 slow ops,
>> > oldest is osd_op(client.2304235452.0:811825 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486889)
>> > Mar 05 19:28:09 s3db13 ceph-osd[5792]: 2022-03-05T19:28:09.125+0000
>> > 7f16115e0700 -1 osd.97 486890 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 ever on either front or back, first ping sent
>> > 2022-03-05T19:27:48.548094+0000 (oldest deadline
>> > 2022-03-05T19:28:08.548094+0000)
>> > Mar 05 19:28:09 s3db13 ceph-osd[5792]: 2022-03-05T19:28:09.125+0000
>> > 7f16115e0700 -1 osd.97 486890 get_health_metrics reporting 4 slow ops,
>> > oldest is osd_op(client.2304235452.0:811825 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486889)
>> > ...
>> > Mar 05 19:28:29 s3db13 ceph-osd[5792]: 2022-03-05T19:28:29.060+0000
>> > 7f16115e0700 -1 osd.97 486890 get_health_metrics reporting 4 slow ops,
>> > oldest is osd_op(client.2304235452.0:811825 4.d 4.97748d0d (undecoded)
>> > ondisk+retry+write+known_if_redirected e486889)
>> > Mar 05 19:28:30 s3db13 ceph-osd[5792]: 2022-03-05T19:28:30.040+0000
>> > 7f16115e0700 -1 osd.97 486890 heartbeat_check: no reply from
>> > [XX:22::60]:6834 osd.171 ever on either front or back, first ping sent
>> > 2022-03-05T19:27:48.548094+0000 (oldest deadline
>> > 2022-03-05T19:28:08.548094+0000)
>> > Mar 05 19:29:43 s3db13 ceph-osd[5792]: 2022-03-05T19:29:43.696+0000
>> > 7f1605dc9700 -1 osd.97 486896 _committed_osd_maps marked down 6 >
>> > osd_max_markdown_count 5 in last 600.000000 seconds, shutting down
>> > Mar 05 19:29:43 s3db13 ceph-osd[5792]: 2022-03-05T19:29:43.700+0000
>> > 7f1613080700 -1 received  signal: Interrupt from Kernel ( Could be
>> > generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
>> > Mar 05 19:29:43 s3db13 ceph-osd[5792]: 2022-03-05T19:29:43.700+0000
>> > 7f1613080700 -1 osd.97 486896 *** Got signal Interrupt ***
>> > Mar 05 19:29:43 s3db13 ceph-osd[5792]: 2022-03-05T19:29:43.700+0000
>> > 7f1613080700 -1 osd.97 486896 *** Immediate shutdown
>> > (osd_fast_shutdown=true) ***
>> > Mar 05 19:29:44 s3db13 systemd[1]: ceph-osd@97.service: Succeeded.
>> > Mar 05 19:29:54 s3db13 systemd[1]: ceph-osd@97.service: Scheduled
>> restart
>> > job, restart counter is at 1.
>> > Mar 05 19:29:54 s3db13 systemd[1]: Stopped Ceph object storage daemon
>> > osd.97.
>> > Mar 05 19:29:54 s3db13 systemd[1]: Starting Ceph object storage daemon
>> > osd.97...
>> > Mar 05 19:29:54 s3db13 systemd[1]: Started Ceph object storage daemon
>> > osd.97.
>> > Mar 05 19:29:55 s3db13 ceph-osd[3236773]: 2022-03-05T19:29:55.116+0000
>> > 7f5852f74d80 -1 Falling back to public interface
>> > Mar 05 19:30:34 s3db13 ceph-osd[3236773]: 2022-03-05T19:30:34.746+0000
>> > 7f5852f74d80 -1 osd.97 486896 log_to_monitors {default=true}
>>
>> --
>> Igor Fedotov
>> Ceph Lead Developer
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>> CEO: Martin Verges - VAT-ID: DE310638492
>> Com. register: Amtsgericht Munich HRB 231263
>> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>>
>>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux