On Tue, Oct 22, 2024 at 01:53:01PM -0700, Jacob Keller wrote: > > > On 10/22/2024 1:28 PM, Joe Damato wrote: > > I took another look at this to make sure that RTNL is held when > > igc_set_queue_napi is called after the e1000 bug report came in [1], > > and there may be two locations I've missed: > > > > 1. igc_resume, which calls __igc_open > > 2. igc_io_error_detected, which calls igc_down > > > > In both cases, I think the code can be modified to hold rtnl around > > calls to __igc_open and igc_down. > > > > Let me know what you think ? > > > > If you agree that I should hold rtnl in both of those cases, what is > > the best way to proceed: > > - send a v4, or > > - wait for this to get merged (since I got the notification it was > > pulled into intel-next) and send a fixes ? > > > > Intel-next uses a stacked set of patches which we then send in batches > via PRs as they pass our internal testing. > > We can drop the v3 and await v4. OK, thanks for the info. I will prepare, test locally, and send a v4 shortly. > > Here's the full analysis I came up with; I tried to be thorough, but > > it is certainly possible I missed a call site: > > > > For the up case: > > > > - igc_up: > > - called from igc_reinit_locked, which is called via: > > - igc_reset_task (rtnl is held) > > - igc_set_features (ndo_set_features, which itself has an ASSERT_RTNL) > > - various places in igc_ethtool (set_priv_flags, nway_reset, > > ethtool_set_eee) all of which have RTNL held > > - igc_change_mtu which also has RTNL held > > - __igc_open > > - called from igc_resume, which may need an rtnl_lock ? > > - igc_open > > - called from igc_io_resume, rtnl is held > > - called from igc_reinit_queues, only via ethool set_channels, > > where rtnl is held > > - ndo_open where rtnl is held > > > > For the down case: > > > > - igc_down: > > - called from various ethtool locations (set_ringparam, > > set_pauseparam, set_link_ksettings) all of which hold rtnl > > - called from igc_io_error_detected, which may need an rtnl_lock > > - igc_reinit_locked which is fine, as described above > > - igc_change_mtu which is fine, as described above > > - called from __igc_close > > - called from __igc_shutdown which holds rtnl > > - called from igc_reinit_queues which is fine as described above > > - called from igc_close which is ndo_close > > This analysis looks complete to me. Thanks; I'd appreciate your comments on the e1000 RFC I posted, if you have a moment. I'm going to update that thread with more data now that I have analysed igc as there are some parallels: https://lore.kernel.org/netdev/20241022172153.217890-1-jdamato@xxxxxxxxxx/