Hi Torkil i would check the logs of the firewalls. First I would check the palo alto firewall logs. Joachim Am Di., 13. Aug. 2024 um 14:36 Uhr schrieb Eugen Block <eblock@xxxxxx>: > Hi Torkil, > > did anything change in the network setup? If those errors haven't > popped up before, what changed? I'm not sure if I have seen this one > yet... > > > Zitat von Torkil Svensgaard <torkil@xxxxxxxx>: > > > Ceph version 18.2.1. > > > > We have a nightly backup job snapshotting and exporting all RBDs > > used for libvirt VMs. Since a couple of weeks ago we've seen one or > > more getting stuck like this on 3 occasions, so intermittently: > > > > " > > zoperator@yggdrasil:~/backup$ cat > > /mnt/scratch/personal/zoperator/slurm-2888600.out > > Creating snap: 10% complete...2024-08-07T10:35:24.687+0200 > > 7f5a2a44b640 0 --2- 172.21.14.135:0/3311921296 >> > > [v2:172.21.15.135:3300/0,v1:172.21.15.135:6789/0] > > conn(0x7f59fc002060 0x7f59fc0094b0 unknown :-1 s=AUTH_CONNECTING > > pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 > > tx=0).send_auth_request get_initial_auth_request returned -2 > > > > 2024-08-07T10:36:02.064+0200 7f5a2a44b640 0 --2- > > 172.21.14.135:0/3311921296 >> > > [v2:172.21.15.150:3300/0,v1:172.21.15.150:6789/0] > > conn(0x7f59fc00a2b0 0x7f59fc015010 unknown :-1 s=AUTH_CONNECTING > > pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 > > tx=0).send_auth_request get_initial_auth_request returned -2 > > > > 2024-08-07T10:37:38.191+0200 7f5a23fff640 0 --2- > > 172.21.14.135:0/3311921296 >> > > [v2:172.21.15.149:3300/0,v1:172.21.15.149:6789/0] > > conn(0x7f5a0c063ea0 0x7f5a0c066320 unknown :-1 s=AUTH_CONNECTING > > pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 > > tx=0).send_auth_request get_initial_auth_request returned -2 > > > > 2024-08-07T10:38:16.677+0200 7f5a28c48640 -1 librbd::ImageWatcher: > > 0x7f5a100076d0 image watch failed: 140024590898336, (107) Transport > > endpoint is not connected > > > > 2024-08-07T10:38:16.677+0200 7f5a28c48640 -1 librbd::Watcher: > > 0x7f5a100076d0 handle_error: handle=140024590898336: (107) Transport > > endpoint is not connected > > " > > > > The VM is also throwing stack traces from stuck I/O. > > > > Every VM seen affected by this map the RBD through multiple > > firewalls so that is likely a factor. > > > > Hypervisor <-> Palo Alto firewall <-> OpenBSD firewall <-> Ceph > > > > Any ideas? I haven't found anything in the ceph logs yet. > > > > Mvh. > > > > Torkil > > > > -- > > Torkil Svensgaard > > Sysadmin > > MR-Forskningssektionen, afs. 714 > > DRCMR, Danish Research Centre for Magnetic Resonance > > Hvidovre Hospital > > Kettegård Allé 30 > > DK-2650 Hvidovre > > Denmark > > Tel: +45 386 22828 > > E-mail: torkil@xxxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx