Re: libceph: mds1 IP+PORT wrong peer at address

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frank,

I am afraid there is buggy in the code and it's racy when updating the new mdsmap with the old one. We have several fixes about this as I remembered.

You can try the newer kernels to see could you reproduce it.

Thanks

- Xiubo

On 13/03/2023 17:10, Frank Schilder wrote:
Hi Xiubo,

its a really old kernel version: 3.10.0-957.10.1.el7.x86_64. We plan to upgrade soonish, but its a major operation. For now we just need a workaround to get the client clean again. Do you have information about what triggers this bug? Maybe we can avoid the occurrence.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Xiubo Li <xiubli@xxxxxxxxxx>
Sent: 13 March 2023 01:44:49
To: Frank Schilder; ceph-users@xxxxxxx
Subject: Re:  libceph: mds1 IP+PORT wrong peer at address

Hi Frank,

BTW, what's your kernel version you were using ? It's a bug and I
haven't ever seen this by using the newer kernel.

You can try to remount the mountpoints and it should work.

Thanks

- Xiubo

On 09/03/2023 17:49, Frank Schilder wrote:
Hi all,

we seem to have hit a bug in the ceph fs kernel client and I just want to confirm what action to take. We get the error "wrong peer at address" in dmesg and some jobs on that server seem to get stuck in fs access; log extract below. I found these 2 tracker items https://tracker.ceph.com/issues/23883 and https://tracker.ceph.com/issues/41519, which don't seem to have fixes.

My questions:

- Is this harmless or does it indicate invalid/corrupted client cache entries?
- How to resolve, ignore, umount+mount or reboot?

Here an extract from the dmesg log, the error has survived a couple of MDS restarts already:

[Mon Mar  6 12:56:46 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address
[Mon Mar  6 13:05:18 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386
[Mon Mar  6 13:05:18 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address
[Mon Mar  6 13:13:50 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-1572619386
[Mon Mar  6 13:13:50 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address
[Mon Mar  6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN)
[Mon Mar  6 13:16:41 2023] libceph: mds1 192.168.32.87:6801 socket closed (con state OPEN)
[Mon Mar  6 13:16:45 2023] ceph: mds1 reconnect start
[Mon Mar  6 13:16:45 2023] ceph: mds1 reconnect start
[Mon Mar  6 13:16:48 2023] ceph: mds1 reconnect success
[Mon Mar  6 13:16:48 2023] ceph: mds1 reconnect success
[Mon Mar  6 13:18:13 2023] ceph: update_snap_trace error -22
[Mon Mar  6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN)
[Mon Mar  6 13:18:17 2023] libceph: mds7 192.168.32.88:6801 socket closed (con state OPEN)
[Mon Mar  6 13:18:23 2023] ceph: mds1 recovery completed
[Mon Mar  6 13:18:23 2023] ceph: mds1 recovery completed
[Mon Mar  6 13:18:28 2023] ceph: mds7 reconnect start
[Mon Mar  6 13:18:28 2023] ceph: mds7 reconnect start
[Mon Mar  6 13:18:28 2023] ceph: mds7 reconnect success
[Mon Mar  6 13:18:29 2023] ceph: mds7 reconnect success
[Mon Mar  6 13:18:35 2023] ceph: update_snap_trace error -22
[Mon Mar  6 13:18:35 2023] ceph: mds7 recovery completed
[Mon Mar  6 13:18:35 2023] ceph: mds7 recovery completed
[Mon Mar  6 13:22:22 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347
[Mon Mar  6 13:22:22 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address
[Mon Mar  6 13:30:54 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347
[...]
[Thu Mar  9 09:37:24 2023] slurm.epilog.cl (31457): drop_caches: 3
[Thu Mar  9 09:38:26 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347
[Thu Mar  9 09:38:26 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address
[Thu Mar  9 09:46:58 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347
[Thu Mar  9 09:46:58 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address
[Thu Mar  9 09:55:30 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347
[Thu Mar  9 09:55:30 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address
[Thu Mar  9 10:04:02 2023] libceph: wrong peer, want 192.168.32.87:6801/-223958753, got 192.168.32.87:6801/-453143347
[Thu Mar  9 10:04:02 2023] libceph: mds1 192.168.32.87:6801 wrong peer at address

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Best Regards,

Xiubo Li (李秀波)

Email: xiubli@xxxxxxxxxx/xiubli@xxxxxxx
Slack: @Xiubo Li

--
Best Regards,

Xiubo Li (李秀波)

Email: xiubli@xxxxxxxxxx/xiubli@xxxxxxx
Slack: @Xiubo Li
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux