On Fri, Jul 11, 2014 at 11:41:22AM +0200, Philipp Hahn wrote: > Hello Wei Liu, > > On 10.07.2014 14:41, Wei Liu wrote: > > On Wed, Jul 02, 2014 at 09:45:44AM +0200, Philipp Hahn wrote: > >> @Wei Liu: You said that the patch is only a quick hack to detect, if my > >> analysis is correct and a proper fix would be needed. For us the > >> attached patch works, as the problem does not happen that often and is > >> hard to reproduce anyway, so spending more time on that issue is > >> probably not worth it. And that flag doesn't look that ugly. > ... > > I agree that we would like to avoid spending too much time on this > > issue. > > This is also what I'm thinking. > > > Since the problem is confirmed, I think a proper fix will be to > > reference count vif and prevent it from unmapping the ring before all > > queued SKBs are consumed. > > vif is already ref-counted; see attached *untested* patch for a start. > What I meant was that, in xenvif_free, we will wait for refcnt to become 0 (or 1?) before freeing vif structure. I think we should do similar thing before unmapping the ring. > What I don't like is xenbus_unmap_ring_vfree() potentially printing an > error on double-free when called from xen_netbk_unmap_frontend_rings(). > How can it be called twice? Does it happen before or after applying my patch? > > But it might require much more work than that quick hack. > > I'm no network driver/Xen expert, but that sounds like more work for no > gain: We would still copy packets for a guest, which is already dead. If > the ring get full, we would probably need to go to sleep and wait for an > answer we will never get anymore. > That's true, so dropping those packets seems to be a good solution. > > FWIW this bug doesn't exist in kernel >=3.12. > > That is even one more point for the hack, as there the problem is > properly fixed and the problem is very obscure to trigger. > > > Would you up for writing a patch? I won't be able to write > > one in the near future. > > Further more, you're the only party now can verify a fix. > > I had a look and created the attached patch, which is untested, as I > currently can't access the faulting system and have been unable to > reproduce it in my development environment. > > The quick hack now runs on that system for several weeks now without > problems. > Thanks for testing. I will need to think further if this hack is really a proper solution. I will keep you posted if there's any update on this. In the mean time you can carry that patch in your own tree for a while. Wei. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html