> -----Original Message----- > From: Greg KH <greg@xxxxxxxxx> > Sent: Saturday, March 05, 2022 5:40 AM > To: Keller, Jacob E <jacob.e.keller@xxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Subject: Re: [PATCH 1/2] ice: Fix race conditions between virtchnl handling and VF > ndo ops > > On Mon, Feb 28, 2022 at 12:46:59PM -0800, Jacob Keller wrote: > > From: Brett Creeley <brett.creeley@xxxxxxxxx> > > > > commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream. > > > > [I had to fix the cherry-pick manually as the patch added a line around > > some context that was missing.] > > > > The VF can be configured via the PF's ndo ops at the same time the PF is > > receiving/handling virtchnl messages. This has many issues, with > > one of them being the ndo op could be actively resetting a VF (i.e. > > resetting it to the default state and deleting/re-adding the VF's VSI) > > while a virtchnl message is being handled. The following error was seen > > because a VF ndo op was used to change a VF's trust setting while the > > VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: > > > > [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: > ICE_ERR_PARAM > > [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 > > [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to > our request 6 > > > > Fix this by making sure the virtchnl handling and VF ndo ops that > > trigger VF resets cannot run concurrently. This is done by adding a > > struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex > > will be locked around the critical operations and VFR. Since the ndo ops > > will trigger a VFR, the virtchnl thread will use mutex_trylock(). This > > is done because if any other thread (i.e. VF ndo op) has the mutex, then > > that means the current VF message being handled is no longer valid, so > > just ignore it. > > > > This issue can be seen using the following commands: > > > > for i in {0..50}; do > > rmmod ice > > modprobe ice > > > > sleep 1 > > > > echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs > > echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs > > > > ip link set ens785f1 vf 0 trust on > > ip link set ens785f0 vf 0 trust on > > > > sleep 2 > > > > echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs > > echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs > > sleep 1 > > echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs > > echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs > > > > ip link set ens785f1 vf 0 trust on > > ip link set ens785f0 vf 0 trust on > > done > > > > Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") > > Cc: <stable@xxxxxxxxxxxxxxx> # 5.14.x > > Signed-off-by: Brett Creeley <brett.creeley@xxxxxxxxx> > > Tested-by: Konrad Jankowski <konrad0.jankowski@xxxxxxxxx> > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@xxxxxxxxx> > > Signed-off-by: Jacob Keller <jacob.e.keller@xxxxxxxxx> > > --- > > This should apply to 5.14.x > > 5.14 is long end-of-life, always look at the kernel.org page if you are > curious what the "active" kernel trees are. > > thanks, > > greg k-h My mistake. I apologize for the wasted time here. Thanks, Jake