Re: rbd-nbd crashes Error: failed to read nbd request header: (33) Numerical argument out of domain

Zhi Zhang <zhang.david2011@xxxxxxxxx> · Wed, 19 May 2021 11:32:04 +0800

On Wed, May 19, 2021 at 11:19 AM Zhi Zhang <zhang.david2011@xxxxxxxxx>
wrote:

>
> On Tue, May 18, 2021 at 10:58 PM Mykola Golub <to.my.trociny@xxxxxxxxx>
> wrote:
> >
> > Could you please provide the full rbd-nbd log? If it is too large for
> > the attachment then may be via some public url?
>
>  ceph.rbd-client.log.bz2
> <https://drive.google.com/file/d/1TuiGOrVAgKIJ3BUmiokG0cU12fnlQ3GR/view?usp=drive_web>
>
> I uploaded it to google driver. Pls check it out.
>
>
>
>

We found the reader_entry thread got zero byte when trying to read the nbd
request header, then rbd-nbd exited and closed the socket. But we haven't
figured out why read zero byte?

> > --
> > Mykola Golub
> >
> > On Tue, May 18, 2021 at 03:04:51PM +0800, Zhi Zhang wrote:
> > > Hi guys,
> > >
> > > We are recently testing rbd-nbd using ceph N version. After map rbd
> > > image, mkfs and mount the nbd device, the rbd-nbd and dmesg will show
> > > following errors when doing some read/write testing.
> > >
> > > rbd-nbd log:
> > >
> > > 2021-05-18 11:35:08.034 7efdb8ff9700 20 []rbd-nbd: reader_entry:
> > > waiting for nbd request
> > > ...
> > > 2021-05-18 11:35:08.066 7efdb8ff9700 -1 []rbd-nbd: failed to read nbd
> > > request header: (33) Numerical argument out of domain
> > > 2021-05-18 11:35:08.066 7efdb3fff700 20 []rbd-nbd: writer_entry: no io
> > > requests, terminating
> > > 2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
> > > 0x564a2be2b3c0 unregister_update_watcher: handle=0
> > > 2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
> > > 0x564a2be2b4b0 ImageUpdateWatchers::unregister_watcher: handle=0
> > > 2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
> > > 0x564a2be2b4b0 ImageUpdateWatchers::unregister_watcher: completing
> > > unregister
> > > 2021-05-18 11:35:08.066 7efdea8d1a00 10 []rbd-nbd: ~NBDServer:
> terminating
> > > 2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
> > > 0x564a2be2b3c0 close
> > >
> > > dmesg:
> > >
> > > [Tue May 18 11:35:07 2021] EXT4-fs (nbd0): mounted filesystem with
> > > ordered data mode. Opts: discard
> > > [Tue May 18 11:35:07 2021] block nbd0: shutting down sockets
> > > [Tue May 18 11:35:09 2021] blk_update_request: I/O error, dev nbd0,
> > > sector 75592 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
> > >
> > > client host info:
> > >
> > > centos7.x
> > > kernel 5.4.109
> > >
> > >
> > > It looks like the kernel nbd device shutdown its socket for some
> > > reason, but we haven't figured it out. BTW, we have tried to turn
> > > on/off rbd cache, use different fs ext4/xfs, use ec pool or replicated
> > > pool, but the error remains. It is more frequent for us to reproduce
> > > when batch map, mkfs and mount rbd-nbd on different hosts
> > > simultaneously.
> > >
> > > Thanks for any suggestions.
> > >
> > > Regards,
> > > Zhi Zhang (David)
> > > Contact: zhang.david2011@xxxxxxxxx
> > >               zhangz.david@xxxxxxxxxxx
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx