On Wed, Aug 02, 2023 at 09:39:55AM -0500, Bob Pearson wrote: > On 8/1/23 17:56, Jason Gunthorpe wrote: > > On Mon, Jul 31, 2023 at 01:44:47PM -0500, Bob Pearson wrote: > >> On 7/31/23 13:32, Jason Gunthorpe wrote: > >>> On Mon, Jul 31, 2023 at 01:26:23PM -0500, Bob Pearson wrote: > >>>> On 7/31/23 13:17, Jason Gunthorpe wrote: > >>>>> On Fri, Jul 21, 2023 at 03:50:22PM -0500, Bob Pearson wrote: > >>>>>> Network interruptions may cause long delays in the processing of > >>>>>> send packets during which time the rxe driver may be unloaded. > >>>>>> This will cause seg faults when the packet is ultimately freed as > >>>>>> it calls the destructor function in the rxe driver. This has been > >>>>>> observed in cable pull fail over fail back testing. > >>>>> > >>>>> No, module reference counts are only for code that is touching > >>>>> function pointers. > >>>> > >>>> this is exactly the case here. it is the skb destructor function that > >>>> is carried by the skb. > >>> > >>> It can't possibly call it correctly without also having the rxe > >>> ib_device reference too though?? > >> > >> Nope. This was causing seg faults in testing when there was a long network > >> hang and the admin tried to reload the rxe driver. The skb code doesn't care > >> about the ib device at all. > > > > I don't get it, there aren't globals in rxe, so WTF is it doing if it > > isn't somehow tracing back to memory that is under the ib_device > > lifetime? > > > > Jason > > When the rxe driver builds a send packet it puts the address of its destructor > subroutine in the skb before calling ip_local_out and sending it. The address of > driver software is now hanging around. If you don't delay the module exit routine > until all the skb's are freed you can cause seg faults. The only way to cause this to > happen is to call rmmod on the driver too early but people have done this occasionally > and report it as a bug. You are missing the point, the destructor currently does this: static void rxe_skb_tx_dtor(struct sk_buff *skb) { struct sock *sk = skb->sk; struct rxe_qp *qp = sk->sk_user_data; So you've already UAF'd because rxe_qp is freed memory well before you get to unloading the module. This series changed it to do this: static void rxe_skb_tx_dtor(struct sk_buff *skb) { struct rxe_dev *rxe; unsigned int index; struct rxe_qp *qp; int skb_out; /* takes a ref on ib device if success */ rxe = get_rxe_from_skb(skb); if (!rxe) goto out; static struct rxe_dev *get_rxe_from_skb(struct sk_buff *skb) { struct rxe_dev *rxe; struct net_device *ndev = skb->dev; rxe = rxe_get_dev_from_net(ndev); if (!rxe && is_vlan_dev(ndev)) rxe = rxe_get_dev_from_net(vlan_dev_real_dev(ndev)); Which seems totally nutz, you are now relying on the global hash table in ib_core to resolve the ib device. Again, why can't this code do something sane like refcount the qp or ib_device so the destruction doesn't progress until all the SKBs are flushed out? Jason