RE: [PATCH net 1/1] i40e: Fix crash when rebuild fails in i40e_xdp_setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Maciej,

We ran regression tests around basic functionality. And for stability we ran multiple iterations of loading and unloading XDP from the interface. I am not sure we covered multiple iterations of reset via ethtool -L 

The crash observed in v4 of the patch was not seen.

Thanks, 
George 

> -----Original Message-----
> From: Fijalkowski, Maciej <maciej.fijalkowski@xxxxxxxxx>
> Sent: Tuesday, April 11, 2023 9:55 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>
> Cc: davem@xxxxxxxxxxxxx; kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx;
> edumazet@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Sylwester Dziedziuch
> <sylwesterx.dziedziuch@xxxxxxxxx>; Karlsson, Magnus
> <magnus.karlsson@xxxxxxxxx>; ast@xxxxxxxxxx; daniel@xxxxxxxxxxxxx;
> hawk@xxxxxxxxxx; john.fastabend@xxxxxxxxx; bpf@xxxxxxxxxxxxxxx; Raczynski,
> Piotr <piotr.raczynski@xxxxxxxxx>; Staikov, Andrii <andrii.staikov@xxxxxxxxx>;
> Maziarz, Kamil <kamil.maziarz@xxxxxxxxx>; Kuruvinakunnel, George
> <george.kuruvinakunnel@xxxxxxxxx>
> Subject: Re: [PATCH net 1/1] i40e: Fix crash when rebuild fails in i40e_xdp_setup
> 
> On Fri, Apr 07, 2023 at 02:09:18PM -0700, Tony Nguyen wrote:
> > From: Sylwester Dziedziuch <sylwesterx.dziedziuch@xxxxxxxxx>
> >
> > When attaching XDP program on i40e driver there was a reset and
> > rebuild of the interface to reconfigure the queues for XDP operation.
> > If one of the steps of rebuild failed then the interface was left in
> > incorrect state that could lead to a crash. If rebuild failed while
> > getting capabilities from HW such crash occurs:
> >
> > capability discovery failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err
> > OK
> > BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000000 Call Trace:
> > ? i40e_reconfig_rss_queues+0x120/0x120 [i40e]
> >   dev_xdp_install+0x70/0x100
> >   dev_xdp_attach+0x1d7/0x530
> >   dev_change_xdp_fd+0x1f4/0x230
> >   do_setlink+0x45f/0xf30
> >   ? irq_work_interrupt+0xa/0x20
> >   ? __nla_validate_parse+0x12d/0x1a0
> >   rtnl_setlink+0xb5/0x120
> >   rtnetlink_rcv_msg+0x2b1/0x360
> >   ? sock_has_perm+0x80/0xa0
> >   ? rtnl_calcit.isra.42+0x120/0x120
> >   netlink_rcv_skb+0x4c/0x120
> >   netlink_unicast+0x196/0x230
> >   netlink_sendmsg+0x204/0x3d0
> >   sock_sendmsg+0x4c/0x50
> >   __sys_sendto+0xee/0x160
> >   ? handle_mm_fault+0xc1/0x1e0
> >   ? syscall_trace_enter+0x1fb/0x2c0
> >   ? __sys_setsockopt+0xd6/0x1d0
> >   __x64_sys_sendto+0x24/0x30
> >   do_syscall_64+0x5b/0x1a0
> >   entry_SYSCALL_64_after_hwframe+0x65/0xca
> >   RIP: 0033:0x7f3535d99781
> >
> > Fix this by removing reset and rebuild from i40e_xdp_setup and replace
> > it by interface down, reconfigure queues and interface up. This way if
> > any step fails the interface will remain in a correct state.
> >
> > Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop
> > actions")
> 
> While I do agree with the overall concept of removing reset logic from XDP control
> path here I feel that change is, as Jesse also wrote, rather too big for a -net
> candidate. It also feels like real issue was not resolved and removing reset path
> from XDP has a positive side effect of XDP not being exposed to real issue.
> 
> What if I would do the rebuild via ethtool -L? There is a non-zero chance that I
> would get the splat above again.
> 
> So I'd rather get this patch via -next and try harder to isolate the NULL ptr deref
> and address that.
> 
> Note that I'm only sharing my thoughts here, other people can disagree and
> proceed with this as is.
> 
> > Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@xxxxxxxxx>
> > Signed-off-by: Piotr Raczynski <piotr.raczynski@xxxxxxxxx>
> > Signed-off-by: Andrii Staikov <andrii.staikov@xxxxxxxxx>
> > Signed-off-by: Kamil Maziarz <kamil.maziarz@xxxxxxxxx>
> > Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@xxxxxxxxx>
> 
> George, can you tell us how was this tested?
> 
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@xxxxxxxxx>
> > ---
> > Note: This will conflict when merging with net-next.
> >
> > Resolution:
> > static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
> >                           struct netlink_ext_ack *extack)
> >   {
> >  -      int frame_size = vsi->netdev->mtu + ETH_HLEN + ETH_FCS_LEN +
> VLAN_HLEN;
> >  +      int frame_size = i40e_max_vsi_frame_size(vsi, prog);
> >
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 159
> > +++++++++++++++-----
> >  1 file changed, 118 insertions(+), 41 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > index 228cd502bb48..5c424f6af834 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > @@ -50,6 +50,8 @@ static int i40e_veb_get_bw_info(struct i40e_veb
> > *veb);  static int i40e_get_capabilities(struct i40e_pf *pf,
> >  				 enum i40e_admin_queue_opc list_type);  static
> bool
> > i40e_is_total_port_shutdown_enabled(struct i40e_pf *pf);
> > +static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi,
> > +					      bool is_xdp);
> >
> >  /* i40e_pci_tbl - PCI Device ID Table
> >   *
> > @@ -3563,11 +3565,16 @@ static int i40e_configure_rx_ring(struct i40e_ring
> *ring)
> >  	/* clear the context structure first */
> >  	memset(&rx_ctx, 0, sizeof(rx_ctx));
> >
> > -	if (ring->vsi->type == I40E_VSI_MAIN)
> > -		xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
> > +	if (ring->vsi->type == I40E_VSI_MAIN &&
> > +	    !xdp_rxq_info_is_reg(&ring->xdp_rxq))
> > +		xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
> > +				 ring->queue_index,
> > +				 ring->q_vector->napi.napi_id);
> >
> >  	ring->xsk_pool = i40e_xsk_pool(ring);
> >  	if (ring->xsk_pool) {
> > +		xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
> > +
> >  		ring->rx_buf_len =
> >  		  xsk_pool_get_rx_frame_size(ring->xsk_pool);
> >  		/* For AF_XDP ZC, we disallow packets to span on @@ -13307,6
> > +13314,39 @@ static netdev_features_t i40e_features_check(struct sk_buff
> *skb,
> >  	return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);  }
> >
> > +/**
> > + * i40e_vsi_assign_bpf_prog - set or clear bpf prog pointer on VSI
> > + * @vsi: VSI to changed
> > + * @prog: XDP program
> > + **/
> > +static void i40e_vsi_assign_bpf_prog(struct i40e_vsi *vsi,
> > +				     struct bpf_prog *prog)
> > +{
> > +	struct bpf_prog *old_prog;
> > +	int i;
> > +
> > +	old_prog = xchg(&vsi->xdp_prog, prog);
> > +	if (old_prog)
> > +		bpf_prog_put(old_prog);
> > +
> > +	for (i = 0; i < vsi->num_queue_pairs; i++)
> > +		WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog); }
> > +
> > +/**
> > + * i40e_vsi_rx_napi_schedule - Schedule napi on RX queues from VSI
> > + * @vsi: VSI to schedule napi on
> > + */
> > +static void i40e_vsi_rx_napi_schedule(struct i40e_vsi *vsi) {
> > +	int i;
> > +
> > +	for (i = 0; i < vsi->num_queue_pairs; i++)
> > +		if (vsi->xdp_rings[i]->xsk_pool)
> > +			(void)i40e_xsk_wakeup(vsi->netdev, i,
> > +					      XDP_WAKEUP_RX);
> > +}
> > +
> >  /**
> >   * i40e_xdp_setup - add/remove an XDP program
> >   * @vsi: VSI to changed
> > @@ -13317,10 +13357,12 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
> struct bpf_prog *prog,
> >  			  struct netlink_ext_ack *extack)
> >  {
> >  	int frame_size = vsi->netdev->mtu + ETH_HLEN + ETH_FCS_LEN +
> > VLAN_HLEN;
> > +	bool is_xdp_enabled = i40e_enabled_xdp_vsi(vsi);
> > +	bool if_running = netif_running(vsi->netdev);
> > +	bool need_reinit = is_xdp_enabled != !!prog;
> >  	struct i40e_pf *pf = vsi->back;
> >  	struct bpf_prog *old_prog;
> > -	bool need_reset;
> > -	int i;
> > +	int ret = 0;
> >
> >  	/* Don't allow frames that span over multiple buffers */
> >  	if (frame_size > i40e_calculate_vsi_rx_buf_len(vsi)) { @@ -13328,53
> > +13370,84 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog
> *prog,
> >  		return -EINVAL;
> >  	}
> >
> > -	/* When turning XDP on->off/off->on we reset and rebuild the rings. */
> > -	need_reset = (i40e_enabled_xdp_vsi(vsi) != !!prog);
> > -
> > -	if (need_reset)
> > -		i40e_prep_for_reset(pf);
> > -
> >  	/* VSI shall be deleted in a moment, just return EINVAL */
> >  	if (test_bit(__I40E_IN_REMOVE, pf->state))
> >  		return -EINVAL;
> >
> > -	old_prog = xchg(&vsi->xdp_prog, prog);
> > +	if (!need_reinit)
> > +		goto assign_prog;
> >
> > -	if (need_reset) {
> > -		if (!prog) {
> > -			xdp_features_clear_redirect_target(vsi->netdev);
> > -			/* Wait until ndo_xsk_wakeup completes. */
> > -			synchronize_rcu();
> > -		}
> > -		i40e_reset_and_rebuild(pf, true, true);
> > +	if (if_running && !test_and_set_bit(__I40E_VSI_DOWN, vsi->state))
> > +		i40e_down(vsi);
> > +
> > +	i40e_vsi_assign_bpf_prog(vsi, prog);
> > +
> > +	vsi = i40e_vsi_reinit_setup(vsi, true);
> > +
> > +	if (!vsi) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Failed to reinitialize VSI during
> XDP setup");
> > +		ret = -EIO;
> > +		goto err_vsi_setup;
> >  	}
> >
> > -	if (!i40e_enabled_xdp_vsi(vsi) && prog) {
> > -		if (i40e_realloc_rx_bi_zc(vsi, true))
> > -			return -ENOMEM;
> > -	} else if (i40e_enabled_xdp_vsi(vsi) && !prog) {
> > -		if (i40e_realloc_rx_bi_zc(vsi, false))
> > -			return -ENOMEM;
> > +	/* allocate descriptors */
> > +	ret = i40e_vsi_setup_tx_resources(vsi);
> > +	if (ret) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Failed to configure TX
> resources during XDP setup");
> > +		goto err_vsi_setup;
> > +	}
> > +	ret = i40e_vsi_setup_rx_resources(vsi);
> > +	if (ret) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Failed to configure RX
> resources during XDP setup");
> > +		goto err_setup_tx;
> >  	}
> >
> > -	for (i = 0; i < vsi->num_queue_pairs; i++)
> > -		WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
> > +	if (!is_xdp_enabled && prog)
> > +		ret = i40e_realloc_rx_bi_zc(vsi, true);
> > +	else if (is_xdp_enabled && !prog)
> > +		ret = i40e_realloc_rx_bi_zc(vsi, false);
> >
> > -	if (old_prog)
> > -		bpf_prog_put(old_prog);
> > +	if (ret) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Failed to reallocate RX
> resources during XDP setup");
> > +		goto err_setup_rx;
> > +	}
> > +
> > +	if (if_running) {
> > +		ret = i40e_up(vsi);
> > +
> > +		if (ret) {
> > +			NL_SET_ERR_MSG_MOD(extack, "Failed to open VSI
> during XDP setup");
> > +			goto err_setup_rx;
> > +		}
> > +	}
> > +	return 0;
> > +
> > +assign_prog:
> > +	i40e_vsi_assign_bpf_prog(vsi, prog);
> > +
> > +	if (need_reinit && !prog)
> > +		xdp_features_clear_redirect_target(vsi->netdev);
> >
> >  	/* Kick start the NAPI context if there is an AF_XDP socket open
> >  	 * on that queue id. This so that receiving will start.
> >  	 */
> > -	if (need_reset && prog) {
> > -		for (i = 0; i < vsi->num_queue_pairs; i++)
> > -			if (vsi->xdp_rings[i]->xsk_pool)
> > -				(void)i40e_xsk_wakeup(vsi->netdev, i,
> > -						      XDP_WAKEUP_RX);
> > +	if (need_reinit && prog) {
> > +		i40e_vsi_rx_napi_schedule(vsi);
> >  		xdp_features_set_redirect_target(vsi->netdev, true);
> >  	}
> >
> >  	return 0;
> > +
> > +err_setup_rx:
> > +	i40e_vsi_free_rx_resources(vsi);
> > +err_setup_tx:
> > +	i40e_vsi_free_tx_resources(vsi);
> > +err_vsi_setup:
> > +	i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
> > +	old_prog = xchg(&vsi->xdp_prog, prog);
> > +	i40e_vsi_assign_bpf_prog(vsi, old_prog);
> 
> wouldn't this be simpler to
> 	i40e_vsi_assign_bpf_prog(vsi, NULL);
> 
> and avoid xchg above? then old_prog can be removed altogether from this func.
> 
> > +
> > +	return ret;
> >  }
> >
> >  /**
> > @@ -14320,13 +14393,14 @@ static int i40e_vsi_setup_vectors(struct
> > i40e_vsi *vsi)
> >  /**
> >   * i40e_vsi_reinit_setup - return and reallocate resources for a VSI
> >   * @vsi: pointer to the vsi.
> > + * @is_xdp: flag indicating if this is reinit during XDP setup
> >   *
> >   * This re-allocates a vsi's queue resources.
> >   *
> >   * Returns pointer to the successfully allocated and configured VSI sw struct
> >   * on success, otherwise returns NULL on failure.
> >   **/
> > -static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
> > +static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi,
> > +bool is_xdp)
> >  {
> >  	u16 alloc_queue_pairs;
> >  	struct i40e_pf *pf;
> > @@ -14362,12 +14436,14 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct
> i40e_vsi *vsi)
> >  	/* Update the FW view of the VSI. Force a reset of TC and queue
> >  	 * layout configurations.
> >  	 */
> > -	enabled_tc = pf->vsi[pf->lan_vsi]->tc_config.enabled_tc;
> > -	pf->vsi[pf->lan_vsi]->tc_config.enabled_tc = 0;
> > -	pf->vsi[pf->lan_vsi]->seid = pf->main_vsi_seid;
> > -	i40e_vsi_config_tc(pf->vsi[pf->lan_vsi], enabled_tc);
> > -	if (vsi->type == I40E_VSI_MAIN)
> > -		i40e_rm_default_mac_filter(vsi, pf->hw.mac.perm_addr);
> > +	if (!is_xdp) {
> > +		enabled_tc = pf->vsi[pf->lan_vsi]->tc_config.enabled_tc;
> > +		pf->vsi[pf->lan_vsi]->tc_config.enabled_tc = 0;
> > +		pf->vsi[pf->lan_vsi]->seid = pf->main_vsi_seid;
> > +		i40e_vsi_config_tc(pf->vsi[pf->lan_vsi], enabled_tc);
> > +		if (vsi->type == I40E_VSI_MAIN)
> > +			i40e_rm_default_mac_filter(vsi, pf->hw.mac.perm_addr);
> > +	}
> >
> >  	/* assign it some queues */
> >  	ret = i40e_alloc_rings(vsi);
> > @@ -15133,7 +15209,8 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf,
> bool reinit, bool lock_acqui
> >  		if (pf->lan_vsi == I40E_NO_VSI)
> >  			vsi = i40e_vsi_setup(pf, I40E_VSI_MAIN, uplink_seid, 0);
> >  		else if (reinit)
> > -			vsi = i40e_vsi_reinit_setup(pf->vsi[pf->lan_vsi]);
> > +			vsi = i40e_vsi_reinit_setup(pf->vsi[pf->lan_vsi],
> > +						    false);
> >  		if (!vsi) {
> >  			dev_info(&pf->pdev->dev, "setup of MAIN VSI failed\n");
> >  			i40e_cloud_filter_exit(pf);
> > --
> > 2.38.1
> >




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux