Re: INFO: task hung in nbd_ioctl

Eric Biggers <ebiggers@xxxxxxxxxx> · Thu, 17 Oct 2019 09:36:34 -0700

On Thu, Oct 17, 2019 at 05:28:29PM +0100, Richard W.M. Jones wrote:
> On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote:
> > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote:
> > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote:
> > >> Hey Josef and nbd list,
> > >>
> > >> I had a question about if there are any socket family restrictions for nbd?
> > > 
> > > In normal circumstances, in userspace, the NBD protocol would only be
> > > used over AF_UNIX or AF_INET/AF_INET6.
> > > 
> > > There's a bit of confusion because netlink is used by nbd-client to
> > > configure the NBD device, setting things like block size and timeouts
> > > (instead of ioctl which is deprecated).  I think you don't mean this
> > > use of netlink?
> > 
> > I didn't. It looks like it is just a bad test.
> > 
> > For the automated test in this thread the test created a AF_NETLINK
> > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used
> > for the NBD_DO_IT ioctl.
> > 
> > I was not sure if the test creator picked any old socket and it just
> > happened to pick one nbd never supported, or it was trying to simulate
> > sockets that did not support the shutdown method.
> > 
> > I attached the automated test that got run (test.c).
> 
> I'd say it sounds like a bad test, but I'm not familiar with syzkaller
> nor how / from where it generates these tests.  Did someone report a
> bug and then syzkaller wrote this test?
> 
> Rich.
> 
> > > 
> > >> The bug here is that some socket familys do not support the
> > >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown
> > >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in
> > >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a
> > >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug
> > >> we hang like we see below.
> > >>
> > >> I can just remove the flush_workqueue call in that code path since it's
> > >> not needed there, but it leaves the original bug my patch was hitting
> > >> where we leave the recv_work running which can then result in leaked
> > >> resources, or possible use after free crashes and you still get the hang
> > >> if you remove the module.
> > >>
> > >> It looks like we have used kernel_sock_shutdown for a while so I thought
> > >> we might never have supported sockets that did not support the callout.
> > >> Is that correct? If so then I can just add a check for this in
> > >> nbd_add_socket and fix that bug too.
> > > 
> > > Rich.
> > > 

It's an automatically generated fuzz test.

There's rarely any such thing as a "bad" fuzz test.  If userspace can do
something that causes the kernel to crash or hang, it's a kernel bug, with very
few exceptions (e.g. like writing to /dev/mem).

If there are cases that aren't supported, like sockets that don't support a
certain function or whatever, then the code needs to check for those cases and
return an error, not hang the kernel.

- Eric