Re: rbd-nbd crashes Error: failed to read nbd request header: (33) Numerical argument out of domain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 30, 2021 at 1:06 PM Yanhu Cao <gmayyyha@xxxxxxxxx> wrote:
>
> Hi Ilya,
>
> Recently, we found these patches(v2)
> http://archive.lwn.net:8080/linux-kernel/YRHa%2FkeJ4pHP3hnL@T590/T/.
> Maybe related?
>
> v3: https://lore.kernel.org/linux-block/20210824141227.808340-2-yukuai3@xxxxxxxxxx/

It doesn't look related at first sight, but who knows...

This is exactly my point about 4.19 being too old -- it is hard to
justify spending time on debugging an issue that reproduces once in
a while on old kernels because it could have been fixed by something
that would appear to be unrelated.

Thanks,

                Ilya

>
> On Mon, Aug 30, 2021 at 6:34 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> >
> > On Tue, Aug 24, 2021 at 11:43 AM Yanhu Cao <gmayyyha@xxxxxxxxx> wrote:
> > >
> > > Any progress on this? We have encountered the same problem, use the
> > > rbd-nbd option timeout=120.
> > > ceph version: 14.2.13
> > > kernel version: 4.19.118-2+deb10u1
> >
> > Hi Yanhu,
> >
> > No, we still don't know what is causing this.
> >
> > If rbd-nbd is being too slow, perhaps disabling the timeout would help?
> > Starting with kernel 5.4, "--io-timeout 0" should do it.
> >
> > In general, the nbd driver is pretty unstable in older kernels.
> > Timeout handling is just one example so I would advise upgrading
> > to a recent kernel, e.g. 5.10 LTS.
> >
> > Thanks,
> >
> >                 Ilya
> >
> > >
> > > On Wed, May 19, 2021 at 10:55 PM Mykola Golub <to.my.trociny@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, May 19, 2021 at 11:32:04AM +0800, Zhi Zhang wrote:
> > > > > On Wed, May 19, 2021 at 11:19 AM Zhi Zhang <zhang.david2011@xxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > > On Tue, May 18, 2021 at 10:58 PM Mykola Golub <to.my.trociny@xxxxxxxxx>
> > > > > > wrote:
> > > > > > >
> > > > > > > Could you please provide the full rbd-nbd log? If it is too large for
> > > > > > > the attachment then may be via some public url?
> > > > > >
> > > > > >  ceph.rbd-client.log.bz2
> > > > > > <https://drive.google.com/file/d/1TuiGOrVAgKIJ3BUmiokG0cU12fnlQ3GR/view?usp=drive_web>
> > > > > >
> > > > > > I uploaded it to google driver. Pls check it out.
> > > > >
> > > > > We found the reader_entry thread got zero byte when trying to read the nbd
> > > > > request header, then rbd-nbd exited and closed the socket. But we haven't
> > > > > figured out why read zero byte?
> > > >
> > > > Ok. I was hoping to find some hint in the log, why the read from the
> > > > kernel could return without data, but I don't see it.
> > > >
> > > > From experience it could happen when the rbd-nbd got stack or was too
> > > > slow so the kernel failed after timeout, but it looked different in
> > > > the logs AFAIR. Anyway you can try increasing the timeout using
> > > > rbd-nbd --timeout (--io-timeout in newer versions) option. The default
> > > > is 30 sec.
> > > >
> > > > If it does not help, probably you will find a clue increasing the
> > > > kernel debug level for nbd (it seems it is possible to do).
> > > >
> > > > --
> > > > Mykola Golub
> > > > _______________________________________________
> > > > Dev mailing list -- dev@xxxxxxx
> > > > To unsubscribe send an email to dev-leave@xxxxxxx
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux