On Mon, Aug 30, 2021 at 1:06 PM Yanhu Cao <gmayyyha@xxxxxxxxx> wrote: > > Hi Ilya, > > Recently, we found these patches(v2) > http://archive.lwn.net:8080/linux-kernel/YRHa%2FkeJ4pHP3hnL@T590/T/. > Maybe related? > > v3: https://lore.kernel.org/linux-block/20210824141227.808340-2-yukuai3@xxxxxxxxxx/ It doesn't look related at first sight, but who knows... This is exactly my point about 4.19 being too old -- it is hard to justify spending time on debugging an issue that reproduces once in a while on old kernels because it could have been fixed by something that would appear to be unrelated. Thanks, Ilya > > On Mon, Aug 30, 2021 at 6:34 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > On Tue, Aug 24, 2021 at 11:43 AM Yanhu Cao <gmayyyha@xxxxxxxxx> wrote: > > > > > > Any progress on this? We have encountered the same problem, use the > > > rbd-nbd option timeout=120. > > > ceph version: 14.2.13 > > > kernel version: 4.19.118-2+deb10u1 > > > > Hi Yanhu, > > > > No, we still don't know what is causing this. > > > > If rbd-nbd is being too slow, perhaps disabling the timeout would help? > > Starting with kernel 5.4, "--io-timeout 0" should do it. > > > > In general, the nbd driver is pretty unstable in older kernels. > > Timeout handling is just one example so I would advise upgrading > > to a recent kernel, e.g. 5.10 LTS. > > > > Thanks, > > > > Ilya > > > > > > > > On Wed, May 19, 2021 at 10:55 PM Mykola Golub <to.my.trociny@xxxxxxxxx> wrote: > > > > > > > > On Wed, May 19, 2021 at 11:32:04AM +0800, Zhi Zhang wrote: > > > > > On Wed, May 19, 2021 at 11:19 AM Zhi Zhang <zhang.david2011@xxxxxxxxx> > > > > > wrote: > > > > > > > > > > > > > > > > > On Tue, May 18, 2021 at 10:58 PM Mykola Golub <to.my.trociny@xxxxxxxxx> > > > > > > wrote: > > > > > > > > > > > > > > Could you please provide the full rbd-nbd log? If it is too large for > > > > > > > the attachment then may be via some public url? > > > > > > > > > > > > ceph.rbd-client.log.bz2 > > > > > > <https://drive.google.com/file/d/1TuiGOrVAgKIJ3BUmiokG0cU12fnlQ3GR/view?usp=drive_web> > > > > > > > > > > > > I uploaded it to google driver. Pls check it out. > > > > > > > > > > We found the reader_entry thread got zero byte when trying to read the nbd > > > > > request header, then rbd-nbd exited and closed the socket. But we haven't > > > > > figured out why read zero byte? > > > > > > > > Ok. I was hoping to find some hint in the log, why the read from the > > > > kernel could return without data, but I don't see it. > > > > > > > > From experience it could happen when the rbd-nbd got stack or was too > > > > slow so the kernel failed after timeout, but it looked different in > > > > the logs AFAIR. Anyway you can try increasing the timeout using > > > > rbd-nbd --timeout (--io-timeout in newer versions) option. The default > > > > is 30 sec. > > > > > > > > If it does not help, probably you will find a clue increasing the > > > > kernel debug level for nbd (it seems it is possible to do). > > > > > > > > -- > > > > Mykola Golub > > > > _______________________________________________ > > > > Dev mailing list -- dev@xxxxxxx > > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx