page fault problems porting a network driver to 2.4.x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Hello,
> 
> We are developing an advanced networking services loadable module and are
> having problems porting it to work on 2.4.x kernels. The driver is
> supposed to provide services such as fault tolerance, load balancing and
> link aggregation over a team of network adapters. It works OK on 2.2.x
> kernels but hangs on 2.4.x kernels.
> 
> In order to debug it, we stripped it down to become a mere "intermediate"
> or "filter" driver that binds to a base driver and passes everything
> through in both directions (Rx, Tx, IOCTL, stats, etc.). After going
> through the basics of modifying the driver to compile on 2.4.x kernels and
> fighting some nasty dead locks due to the new nature of the networking
> layer, we managed to get it to run. The driver will receive and transmit a
> few hundreds of thousands of packets (while having a periodic timer expire
> 10 times a second and running continuous IOCTLs), and then it causes an
> oops about not being able to handle a page fault.
> 
> The function looks something like:
> 
> int iansHardStartXmit(struct sk_buff *skb, struct net_device *dev) {
> 	int res;
> 	struct net_device *base;
> 
> 	spin_lock(&lock);
> 	base = get_base_driver_by_name(name);
> 
> 	if(base != NULL) {
> 		res = base->hard_start_xmit(skb, base);
> 	}
> 
> 	spin_unlock(&lock);
> 	return res;
> }
> 
> We used kdb in order to track down the problem and found out the following
> stack trace:
> 
>  EBP		EIP		function(args)
> 0xc4cd1c54	0xd081e3e7	[e100]__kallsyms+0xb (0xc4b595a0,
> 0xc840f200)
> 					e100 __kallsyms 0xd081e3dc
> 0xd081e3dc 0xd0820dsc
> 		0xd08244ba	[ians]iansHardStartXmit+0xa6 (0xc4b595a0,
> 0xc4d9bc00)
> 					ians .text 0xd0824060 0xd0824414
> 0xd082452c
> 		0xc01f9d1f	qdisc_restart+0xcf (0xc4d9bc00)
> 					kernel .text 0xc0100000 0xc01f9c50
> 0xc01f9f14
> 	*
> 	*
> 	*
> 
> This goes on and shows that this is an ICMP echo reply packet going down
> through the IP stack to the filter driver (apparently 0xc4b595a0 is the
> skb, 0xc4d9bc00 is the *dev of the filter driver and 0xc840f200 is the
> *dev of the base driver). The filter driver is supposed to call the
> dev->hard_start_xmit of the base driver, but strangely it lands somewhere
> in the data segment of the base driver (__kallsyms is a part of the symbol
> table of the module according to insmod -m).
> Figuring the dev->hard_start_xmit pointer got trashed somehow, we added a
> check to make sure the same pointer is always called, and indeed this was
> the case. Looking at the assembly code with kdb, we could see that the
> call to the base driver is done by a 'call *%eax' command. kdb reports
> that eax=0xffffffff after the page fault (origeax).
> 
> How is it possible that the pointer to the function keeps it's value, but
> the jump to that function falls somewhere else ?
> The entire function is protected by a spinlock, so there is no worry about
> the other threads messing my data.
> 
> We are using:
> RedHat 6.2
> gcc v2.91.66
> modutils v2.3.11-1
> kernel linux-2.4.0-test9
> kdb v1.5-2.4.0-test9-pre9
> Compaq ap500 dual p-III Xeon
> 
> 
> 	Thanks,
> 	Shmulik Hen
> 
> 	Software Engineer
> 	Linux Advanced Networking Services
> 	Network Communications Group, Israel (NCGj)
> 	Intel Corporation Ltd.
> 
> 

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org


[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux