> From: linux-hyperv-owner@xxxxxxxxxxxxxxx > <linux-hyperv-owner@xxxxxxxxxxxxxxx> On Behalf Of Dexuan Cui > Sent: Wednesday, May 15, 2019 9:34 PM > ... Hi Sunil, To make it clear, your patch itself is good, and I was just talking about the next change we're going to make. Once we make the next change, IMO we need a further patch to schedule hvs_close_timeout() to the new single-threaded workqueue rather than the global "system_wq". > Next, we're going to remove the "channel->rescind" check in > vmbus_hvsock_device_unregister() -- when doing that, IMO we need to > fix a potential race revealed by the schedule_delayed_work() in this > patch: > > When hvs_close_timeout() finishes, the "sk" struct has been freed, but > vmbus_onoffer_rescind() -> channel->chn_rescind_callback(), i.e. > hvs_close_connection(), may be still running and referencing the "chan" > and "sk" structs (), which should no longer be referenced when > hvs_close_timeout() finishes, i.e. "get_per_channel_state(chan)" is no > longer safe. The problem is: currently there is no sync mechanism > between vmbus_onoffer_rescind() and hvs_close_timeout(). > > The race is a real issue only after we remove the "channel->rescind" > in vmbus_hvsock_device_unregister(). A correction: IMO the race is real even for the current code, i.e. without your patch: in vmbus_onoffer_rescind(), between we set channel->rescind and we call channel->chn_rescind_callback(), the channel may have been freed by vmbus_hvsock_device_unregister(). This race window is small and I guess that's why we never noticed it. > I guess we need to introduce a new single-threaded workqueue in the > vmbus driver, and offload both vmbus_onoffer_rescind() and > hvs_close_timeout() onto the new workqueue. Thanks, -- Dexuan