Ceph version 18.2.1.
We have a nightly backup job snapshotting and exporting all RBDs used
for libvirt VMs. Since a couple of weeks ago we've seen one or more
getting stuck like this on 3 occasions, so intermittently:
"
zoperator@yggdrasil:~/backup$ cat
/mnt/scratch/personal/zoperator/slurm-2888600.out
Creating snap: 10% complete...2024-08-07T10:35:24.687+0200 7f5a2a44b640
0 --2- 172.21.14.135:0/3311921296 >>
[v2:172.21.15.135:3300/0,v1:172.21.15.135:6789/0] conn(0x7f59fc002060
0x7f59fc0094b0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=1 rev1=1
crypto rx=0 tx=0 comp rx=0 tx=0).send_auth_request
get_initial_auth_request returned -2
2024-08-07T10:36:02.064+0200 7f5a2a44b640 0 --2-
172.21.14.135:0/3311921296 >>
[v2:172.21.15.150:3300/0,v1:172.21.15.150:6789/0] conn(0x7f59fc00a2b0
0x7f59fc015010 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=1 rev1=1
crypto rx=0 tx=0 comp rx=0 tx=0).send_auth_request
get_initial_auth_request returned -2
2024-08-07T10:37:38.191+0200 7f5a23fff640 0 --2-
172.21.14.135:0/3311921296 >>
[v2:172.21.15.149:3300/0,v1:172.21.15.149:6789/0] conn(0x7f5a0c063ea0
0x7f5a0c066320 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=1 rev1=1
crypto rx=0 tx=0 comp rx=0 tx=0).send_auth_request
get_initial_auth_request returned -2
2024-08-07T10:38:16.677+0200 7f5a28c48640 -1 librbd::ImageWatcher:
0x7f5a100076d0 image watch failed: 140024590898336, (107) Transport
endpoint is not connected
2024-08-07T10:38:16.677+0200 7f5a28c48640 -1 librbd::Watcher:
0x7f5a100076d0 handle_error: handle=140024590898336: (107) Transport
endpoint is not connected
"
The VM is also throwing stack traces from stuck I/O.
Every VM seen affected by this map the RBD through multiple firewalls so
that is likely a factor.
Hypervisor <-> Palo Alto firewall <-> OpenBSD firewall <-> Ceph
Any ideas? I haven't found anything in the ceph logs yet.
Mvh.
Torkil
--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: torkil@xxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx