Re: page fault problems porting a network driver to 2.4.x

Andi Kleen <ak@suse.de> · Tue, 24 Oct 2000 21:45:19 +0200

On Tue, Oct 24, 2000 at 11:21:23AM -0700, Hen, Shmulik wrote:
> > The function looks something like:
> > 
> > int iansHardStartXmit(struct sk_buff *skb, struct net_device *dev) {
> > 	int res;
> > 	struct net_device *base;
> > 
> > 	spin_lock(&lock);

Normally the network code should synchronize the startxmit entry for you.
If it didn't the lock should probably be a spin_lock_irqsave.

> > 	base = get_base_driver_by_name(name);
> > 
> > 	if(base != NULL) {
> > 		res = base->hard_start_xmit(skb, base);
> > 	}
> > 
> > 	spin_unlock(&lock);
> > 	return res;
> > }
> > 
> > We used kdb in order to track down the problem and found out the following
> > stack trace:
> > 
> >  EBP		EIP		function(args)
> > 0xc4cd1c54	0xd081e3e7	[e100]__kallsyms+0xb (0xc4b595a0,

My first guess for that would be that you didn't compile the kdb
kernel with frame pointers and it is some stack garbage.

> > Figuring the dev->hard_start_xmit pointer got trashed somehow, we added a
> > check to make sure the same pointer is always called, and indeed this was
> > the case. Looking at the assembly code with kdb, we could see that the
> > call to the base driver is done by a 'call *%eax' command. kdb reports
> > that eax=0xffffffff after the page fault (origeax).

origeax is always -1 for exceptions, it is used as a marker that it isn't
a system call. Only for system calls it is the real eax. You should
probably look at the real eax a bit below.

-Andi
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org