On Sat, 29 Dec 2018 02:45:49 +0000 "Wang, Wei W" <wei.w.wang@xxxxxxxxx> wrote: > On Friday, December 28, 2018 3:57 PM, Christian Borntraeger wrote: > > On 28.12.2018 03:26, Wei Wang wrote: > > > Some vqs don't need to be allocated when the related feature bits are > > > disabled. Callers notice the vq allocation layer by setting the > > > related names[i] to be NULL. > > > > > > This patch series fixes the find_vqs implementations to handle this case. > > > > So the random crashes during boot are gone. > > What still does not work is actually using the balloon. > > > > So in the qemu monitor using lets say "balloon 1000" will hang the guest. > > Seems to be a deadlock in the virtio-ccw code. We seem to call the config > > code in the interrupt handler. > > Yes. It reads a config register from the interrupt handler. Do you know why ccw doesn't support it and has some internal lock that caused the deadlock issue? > > Best, > Wei I guess you are the first one trying to read virtio config from within interrupt context. AFAICT this never worked. About what happens. The apidoc of ccw_device_start() says it needs to be called with the ccw device lock held, so ccw_io_helper() tries to take it (since forever I guess). OTOH do_cio_interrupt() takes the subchannel lock and io_subchannel_initialize_dev() makes the ccw device lock be the subchannel lock. That means when one tries to get virtio config form within a cio interrupt context we deadlock, because we try to take a lock we already have. That said, I don't think this limitation is by design (i.e. intended). Maybe Connie can help us with that question. AFAIK we have nothing documented regarding this (neither that can nor can't). Obviously, there are multiple ways around this problem, and at the moment I can't tell which would be my preferred one. Regards, Halil