On Sat, 2 Jan 2010, Sean wrote: > Alan, > > Thanks again. I was able to get the full dmesg output this time. I ran > capture-example three times and each time removing the webcam before > capture-example finished. On the third time I got the poisoned hash > message and I captured the output to a file. Attached is the dmesg output. Okay. Take a look at the following output: $ egrep -n '[2e]e(80|9c)' dmesg2.log 680:pci 0000:00:0c.0: reg 14 io port: [0xee80-0xee83] 727:kobject: 'ieee80211' (c79d5e1c): kobject_add_internal: parent: 'class', set: 'class' 728:kobject: 'ieee80211' (c79d5e1c): kobject_uevent_env 729:kobject: 'ieee80211' (c79d5e1c): fill_kobj_path: path = '/class/ieee80211' 4994:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c6662e80 5027:ohci_hcd 0000:00:0b.0: hash c6662e80 to 58 -> (null) 5185:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c676ee80 5218:ohci_hcd 0000:00:0b.0: hash c676ee80 to 58 -> c6662e80 5276:ohci_hcd 0000:00:0b.0: td free c6662e80 5277:ohci_hcd 0000:00:0b.0: (58 1) c676ee9c -> (null) 5296:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c6662e80 5329:ohci_hcd 0000:00:0b.0: hash c6662e80 to 58 -> c676ee80 5538:ohci_hcd 0000:00:0b.0: td free c676ee80 5539:ohci_hcd 0000:00:0b.0: (58 1) c6662e9c -> (null) 5558:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c676ee80 5591:ohci_hcd 0000:00:0b.0: hash c676ee80 to 58 -> c6662e80 5644:ohci_hcd 0000:00:0b.0: td free c6662e80 5645:ohci_hcd 0000:00:0b.0: (58 1) c676ee9c -> (null) 5720:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c6662e80 5753:ohci_hcd 0000:00:0b.0: hash c6662e80 to 58 -> c676ee80 5900:ohci_hcd 0000:00:0b.0: td free c676ee80 5901:ohci_hcd 0000:00:0b.0: (58 1) c6662e9c -> (null) 5978:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c676ee80 6011:ohci_hcd 0000:00:0b.0: hash c676ee80 to 58 -> c6662e80 6072:ohci_hcd 0000:00:0b.0: td free c6662e80 6073:ohci_hcd 0000:00:0b.0: (58 1) c676ee9c -> (null) 6088:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c6662e80 6121:ohci_hcd 0000:00:0b.0: hash c6662e80 to 58 -> c676ee80 6324:ohci_hcd 0000:00:0b.0: td free c676ee80 6325:ohci_hcd 0000:00:0b.0: (58 1) c6662e9c -> (null) 6343:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c676ee80 6376:ohci_hcd 0000:00:0b.0: hash c676ee80 to 58 -> c6662e80 6416:ohci_hcd 0000:00:0b.0: td free c6662e80 6417:ohci_hcd 0000:00:0b.0: (58 1) c676ee9c -> c676ee80 6492:ohci_hcd 0000:00:0b.0: td alloc for 2 ep85: c6662e80 6525:ohci_hcd 0000:00:0b.0: hash c6662e80 to 58 -> c676ee80 6686:ohci_hcd 0000:00:0b.0: td free c676ee80 6687:ohci_hcd 0000:00:0b.0: (58 1) c6662e9c -> c676ee80 Ignore the first few lines as being irrelevant. Starting with line 5185 you can see that this shows two TDs being allocated, hashed, freed, and unlinked over and over again, at addresses c6662e80 and c676ee80. When each one is hashed into the list, its td_hash member is made to point to the other. When each is removed from the hash list, the other's td_hash member is set to NULL. It's all very regular and repetitious until line 6417. At that line, the td_hash member of c676ee80 (which is at offset 1c from the start of the structure, hence at c676ee9c) is made to point to its own structure! Thus later at line 6687, when c676ee80 is freed, the td_hash member of c6662e80 is set to point at the freed structure. This is what leads to poisoned pointer values later on. So what went wrong at line 6417? There's no way to know exactly. My guess is that the write at line 6325, where c6662e9c was supposed to be set to NULL, never got recorded properly in the computer's memory. This would mean that c6662e9c still contained the c676ee80 value assigned at line 6121, and hence the incorrect value was copied at line 6417. In other words, I'm guessing that you're suffering from hardware memory errors. A possible way to test this is to modify the patch. In td_free() where it adds the line: + ohci_dbg(hc, "(%d %d) %p -> %p\n", hash, n, prev, *prev); instead add this code: + barrier(); + ohci_dbg(hc, "(%d %d) %p -> %p [%p]\n", hash, n, + prev, *prev, td->td_hash); If we find that the value of *prev differs from the value of td->td_hash then we'll know for certain. (Or maybe the presence of the barrier() will cause the object code to change in a way that prevents the error from occurring.) Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html