On Thu, Oct 17, 2019 at 05:28:29PM +0100, Richard W.M. Jones wrote: > On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote: > > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: > > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > > >> Hey Josef and nbd list, > > >> > > >> I had a question about if there are any socket family restrictions for nbd? > > > > > > In normal circumstances, in userspace, the NBD protocol would only be > > > used over AF_UNIX or AF_INET/AF_INET6. > > > > > > There's a bit of confusion because netlink is used by nbd-client to > > > configure the NBD device, setting things like block size and timeouts > > > (instead of ioctl which is deprecated). I think you don't mean this > > > use of netlink? > > > > I didn't. It looks like it is just a bad test. > > > > For the automated test in this thread the test created a AF_NETLINK > > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used > > for the NBD_DO_IT ioctl. > > > > I was not sure if the test creator picked any old socket and it just > > happened to pick one nbd never supported, or it was trying to simulate > > sockets that did not support the shutdown method. > > > > I attached the automated test that got run (test.c). > > I'd say it sounds like a bad test, but I'm not familiar with syzkaller > nor how / from where it generates these tests. Did someone report a > bug and then syzkaller wrote this test? > > Rich. > > > > > > >> The bug here is that some socket familys do not support the > > >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown > > >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in > > >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a > > >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug > > >> we hang like we see below. > > >> > > >> I can just remove the flush_workqueue call in that code path since it's > > >> not needed there, but it leaves the original bug my patch was hitting > > >> where we leave the recv_work running which can then result in leaked > > >> resources, or possible use after free crashes and you still get the hang > > >> if you remove the module. > > >> > > >> It looks like we have used kernel_sock_shutdown for a while so I thought > > >> we might never have supported sockets that did not support the callout. > > >> Is that correct? If so then I can just add a check for this in > > >> nbd_add_socket and fix that bug too. > > > > > > Rich. > > > It's an automatically generated fuzz test. There's rarely any such thing as a "bad" fuzz test. If userspace can do something that causes the kernel to crash or hang, it's a kernel bug, with very few exceptions (e.g. like writing to /dev/mem). If there are cases that aren't supported, like sockets that don't support a certain function or whatever, then the code needs to check for those cases and return an error, not hang the kernel. - Eric