On Sat, Nov 15, 2008 at 12:19 PM, Bob Copeland <me@xxxxxxxxxxxxxxx> wrote: > On Sat, Nov 15, 2008 at 12:29:34AM -0600, Dan McGee wrote: >> On Fri, Nov 14, 2008 at 8:57 PM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: >> > >> > BUG: unable to handle kernel NULL pointer dereference at 00000082 >> > IP: [<7818ca71>] sysfs_find_dirent+0x9/0x23 >> > Oops: 0000 [#1] PREEMPT >> > Modules linked in: ath5k(+) mac80211 > > So, just to recap, this is with Luis' patch; now you get a null pointer > dereference in sysfs instead of in ieee80211_register_hw? It does look > like we're deep in register_netdevice now. If you revert his patch, you > can still get the error in register_hw every time? Yeah, this is with Luis' patch. Without that patch it always bugs out at the earlier step in register_hw(). And like I said, I can't reproduce this one with debug symbols built into the kernel unfortunately. >> > Pid: 818 comm: modprobe Not tainted (2.6.27.6eee #1) >> > EIP: 0060:[<7818ca71>] EFLAGS: 00010206 CPU: 0 >> > EIP is at sysfs_find_dirent+0x9/0x23 >> > EAX: 00000001 EBX: 00000072 ECX: 00000001 EDX: b730b4f0 >> > ESI: b730b4f0 EDI: fffffff4 EBP: b7311490 ESP: b73ffd34 > > EBX is 00000072, definitely not a pointer. > >> And I had the code completely wrong, oops. Looks like we are bailing >> on the strcmp call in this function or something along those lines? I >> wish I could be a bigger help with debugging this stuff. > > Yep, or at least in the setup code for that. Don't worry, you're being > a big help; I think we just don't have a good enough theory yet to > propose decent debugging patches. > >> struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd, >> const unsigned char *name) >> { >> 1bc: 56 push %esi >> 1bd: 89 d6 mov %edx,%esi >> 1bf: 53 push %ebx >> struct sysfs_dirent *sd; >> >> for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling) >> 1c0: 8b 58 18 mov 0x18(%eax),%ebx >> 1c3: eb 11 jmp 1d6 <sysfs_find_dirent+0x1a> >> if (!strcmp(sd->s_name, name)) >> 1c5: 8b 43 10 mov 0x10(%ebx),%eax > > EBX appears to be sd (it's initialized at line 1c0 to parent_sd + 0x18, > which is &parent_sd->s_dir.children, then it jumps to the loop test). > Thus EAX must be sd->s_sibling, which we hope to use for strcmp. > > So, while traversing the sibling pointers, one of them happens to be > 00000072 (instead of what should probably have been NULL). 0x72 is not > a poison value I'm aware of. At this point, things have gone south, but > the real problem happened earlier. Yeah, I figured it was something earlier that didn't quite work out, but I really had no idea where to start poking. > Can you post your .config? Sure- here it is: http://www.toofishes.net/uploads/kernelconfig -Dan -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html