16.2.6 OSD Heartbeat Issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,

For a new build we tested the 5.4 kernel which wasn't working well for us
and ultimately changed to Ubuntu 20.04.3 HWE and 5.11 kernel.
We can now get all OSDs more or less up, but on a clean OS reinstall we are
seeing this type of behavior that is causing slow ops even before any pool
and filesystem has been created.

We are using LACP bonds w/MTU 9000 for both front and back networks.  Both
networks are 100G.  I've tried increasing bind ports from default of 7300.

Any ideas?

Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  0 _get_class not permitted to
load lua
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic>
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  0 _get_class not permitted to
load sdk
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  0 _get_class not permitted to
load kvs
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic>
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  0 osd.74 976 crush map has
features 288514051259236352, adjusting msgr requires for clients
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  0 osd.74 976 crush map has
features 288514051259236352 was 8705, adjusting msgr requires for mons
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: 2021-10-18T16:17:45.248+0000
7f5363c2c080  0 osd.74 976 crush map has features 3314933000852226048,
adjusting msgr requires for osds
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.248+0000 7f5363c2c080  1 osd.74 976
check_osdmap_features require_osd_release unknown -> pacific
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.264+0000 7f5363c2c080  0 osd.74 976 load_pgs
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.264+0000 7f5363c2c080  0 osd.74 976 load_pgs opened 1
pgs
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.268+0000 7f5363c2c080 -1 osd.74 976 log_to_monitors
{default=true}
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.688+0000 7f5363c2c080  0 osd.74 976 done with init,
starting boot process
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: 2021-10-18T16:17:45.688+0000
7f5363c2c080  1 osd.74 976 start_boot
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.688+0000 7f53412e0700  1 osd.74 pg_epoch: 976 pg[1.0(
empty local-lis/les=0/0 n=0 ec=149/149 lis/c=0/0 les/c/f=0/0/0 sis=975)
[74,0] r=0 lpr=975 pi=[149,975)/7 crt=0'0 m>
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.692+0000 7f5356b50700 -1 osd.74 976 set_numa_affinity
unable to identify public interface '' numa node: (2) No such file or
directory
Oct 18 16:17:45 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:45.692+0000 7f5356b50700  1 osd.74 976 set_numa_affinity
not setting numa affinity
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.288+0000 7f5358353700  1 osd.74 976 tick checking mon
for new map
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.300+0000 7f53412e0700  1 osd.74 pg_epoch: 977 pg[1.0(
empty local-lis/les=0/0 n=0 ec=149/149 lis/c=0/0 les/c/f=0/0/0 sis=975)
[74,0] r=0 lpr=975 pi=[149,975)/7 crt=0'0 m>
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.300+0000 7f53412e0700  1 osd.74 pg_epoch: 977 pg[1.0(
empty local-lis/les=0/0 n=0 ec=149/149 lis/c=0/0 les/c/f=0/0/0 sis=977) [0]
r=-1 lpr=977 pi=[149,977)/7 crt=0'0 mlc>
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.300+0000 7f53412e0700  1 osd.74 pg_epoch: 979 pg[1.0(
empty local-lis/les=0/0 n=0 ec=149/149 lis/c=0/0 les/c/f=0/0/0 sis=977) [0]
r=-1 lpr=977 pi=[149,977)/7 crt=0'0 mlc>
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.724+0000 7f534e130700  1 osd.74 980 state: booting ->
active
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.724+0000 7f53412e0700  1 osd.74 pg_epoch: 980 pg[1.0(
empty local-lis/les=0/0 n=0 ec=149/149 lis/c=0/0 les/c/f=0/0/0 sis=980)
[74,0] r=0 lpr=980 pi=[149,980)/7 crt=0'0 m>
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.724+0000 7f53412e0700  1 osd.74 pg_epoch: 980 pg[1.0(
empty local-lis/les=0/0 n=0 ec=149/149 lis/c=0/0 les/c/f=0/0/0 sis=980)
[74,0] r=0 lpr=980 pi=[149,980)/7 crt=0'0 m>
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.724+0000 7f535cbd9700 -1 --2-
<CLUSTER_IP_NODE1>:0/3582903554 >>
[v2:<CLUSTER_IP_NODE1>:7275/2411802373,v1:<CLUSTER_IP_NODE1>:7279/2411802373]
conn(0x55e9db0dd800 0x55e9db1c4000 unknown :-1 >
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.724+0000 7f535d3da700 -1 --2-
<CLUSTER_IP_NODE1>:0/3582903554 >>
[v2:<CLUSTER_IP_NODE1>:6917/4091393816,v1:<CLUSTER_IP_NODE1>:6922/4091393816]
conn(0x55e9db1ca000 0x55e9db1c4a00 unknown :-1 >
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.724+0000 7f535d3da700 -1 --2-
<CLUSTER_IP_NODE1>:0/3582903554 >>
[v2:<CLUSTER_IP_NODE1>:6970/4027233516,v1:<CLUSTER_IP_NODE1>:6976/4027233516]
conn(0x55e9db1cb800 0x55e9db1c6d00 unknown :-1 >
Oct 18 16:17:46 <OSD_HOST_02> conmon[3587539]: debug
2021-10-18T16:17:46.724+0000 7f535cbd9700 -1 --2-
<CLUSTER_IP_NODE1>:0/3582903554 >>
[v2:<CLUSTER_IP_NODE1>:7112/1368237159,v1:<CLUSTER_IP_NODE1>:7115/1368237159]
conn(0x55e9db15c800 0x55e9db162500 unknown :-1 >

Thanks,
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux