Re: [PATCH] Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 02, 2014 at 09:45:44AM +0200, Philipp Hahn wrote:
> Hello Wei Liu,
> 
> On 27.06.2014 20:24, Philipp Hahn wrote:
> > On 27.06.2014 19:48, Philipp Hahn wrote:> I guess we found the problem
> > ourselves: For thus removed skb's the
> >> reference counter on the associated vif was not decremented, as it is
> >> normally done in two locations at the end of the function
> >> xen_netbk_rx_action():
> > ...
> >> The test is currently running again for the weekend and on Monday we
> >> will hopefully know more.
> > 
> > FYI: The test VM survived the first reboot without locking up:
> ...
> > Jun 27 19:49:23 xenmbint05b01 kernel: [ 2055.898349] UniDEBUG
> > vif->mapped is false
> 
> The host survived the weekend with the problematic VM rebooting every 5
> minutes; the log shows the shared ring being accessed unmapped, where
> the kernel crashed previously.
> 
> So the attached patch fixes the bug (or at least prevents the OOPS).
> 
> @Wei Liu: You said that the patch is only a quick hack to detect, if my
> analysis is correct and a proper fix would be needed. For us the
> attached patch works, as the problem does not happen that often and is
> hard to reproduce anyway, so spending more time on that issue is
> probably not worth it. And that flag doesn't look that ugly.
> 

Sorry for the late reply. I was away for two weeks.

I agree that we would like to avoid spending too much time on this
issue.

Since the problem is confirmed, I think a proper fix will be to
reference count vif and prevent it from unmapping the ring before all
queued SKBs are consumed. But it might require much more work than that
quick hack. Would you up for writing a patch? I won't be able to write
one in the near future. Further more, you're the only party now can
verify a fix.

> @stable: at least 3.10 has the bug, but other long-term-stable kernels
> have it too. The code in current is different as multi-queue was added,
> so the patch wouldn't be in current.
> 

FWIW this bug doesn't exist in kernel >=3.12.

Wei.

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]