Re: Snapshot getting stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Torkil
i would check the logs of the firewalls. First I would check the palo alto
firewall logs.
Joachim


Am Di., 13. Aug. 2024 um 14:36 Uhr schrieb Eugen Block <eblock@xxxxxx>:

> Hi Torkil,
>
> did anything change in the network setup? If those errors haven't
> popped up before, what changed? I'm not sure if I have seen this one
> yet...
>
>
> Zitat von Torkil Svensgaard <torkil@xxxxxxxx>:
>
> > Ceph version 18.2.1.
> >
> > We have a nightly backup job snapshotting and exporting all RBDs
> > used for libvirt VMs. Since a couple of weeks ago we've seen one or
> > more getting stuck like this on 3 occasions, so intermittently:
> >
> > "
> > zoperator@yggdrasil:~/backup$ cat
> > /mnt/scratch/personal/zoperator/slurm-2888600.out
> > Creating snap: 10% complete...2024-08-07T10:35:24.687+0200
> > 7f5a2a44b640 0 --2- 172.21.14.135:0/3311921296 >>
> > [v2:172.21.15.135:3300/0,v1:172.21.15.135:6789/0]
> > conn(0x7f59fc002060 0x7f59fc0094b0 unknown :-1 s=AUTH_CONNECTING
> > pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0
> > tx=0).send_auth_request get_initial_auth_request returned -2
> >
> > 2024-08-07T10:36:02.064+0200 7f5a2a44b640  0 --2-
> > 172.21.14.135:0/3311921296 >>
> > [v2:172.21.15.150:3300/0,v1:172.21.15.150:6789/0]
> > conn(0x7f59fc00a2b0 0x7f59fc015010 unknown :-1 s=AUTH_CONNECTING
> > pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0
> > tx=0).send_auth_request get_initial_auth_request returned -2
> >
> > 2024-08-07T10:37:38.191+0200 7f5a23fff640  0 --2-
> > 172.21.14.135:0/3311921296 >>
> > [v2:172.21.15.149:3300/0,v1:172.21.15.149:6789/0]
> > conn(0x7f5a0c063ea0 0x7f5a0c066320 unknown :-1 s=AUTH_CONNECTING
> > pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0
> > tx=0).send_auth_request get_initial_auth_request returned -2
> >
> > 2024-08-07T10:38:16.677+0200 7f5a28c48640 -1 librbd::ImageWatcher:
> > 0x7f5a100076d0 image watch failed: 140024590898336, (107) Transport
> > endpoint is not connected
> >
> > 2024-08-07T10:38:16.677+0200 7f5a28c48640 -1 librbd::Watcher:
> > 0x7f5a100076d0 handle_error: handle=140024590898336: (107) Transport
> > endpoint is not connected
> > "
> >
> > The VM is also throwing stack traces from stuck I/O.
> >
> > Every VM seen affected by this map the RBD through multiple
> > firewalls so that is likely a factor.
> >
> > Hypervisor <-> Palo Alto firewall <-> OpenBSD firewall <-> Ceph
> >
> > Any ideas? I haven't found anything in the ceph logs yet.
> >
> > Mvh.
> >
> > Torkil
> >
> > --
> > Torkil Svensgaard
> > Sysadmin
> > MR-Forskningssektionen, afs. 714
> > DRCMR, Danish Research Centre for Magnetic Resonance
> > Hvidovre Hospital
> > Kettegård Allé 30
> > DK-2650 Hvidovre
> > Denmark
> > Tel: +45 386 22828
> > E-mail: torkil@xxxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux