On 11/02/11 22:47, Heiko Carstens wrote: > On Wed, Nov 02, 2011 at 09:37:06PM +0900, Jun'ichi Nomura wrote: >> On 10/31/11 22:00, Heiko Carstens wrote: >>> On Mon, Oct 31, 2011 at 08:46:06PM +0900, Jun'ichi Nomura wrote: >>>> Hm, dm_softirq_done is generic completion code of original >>>> request in dm-multipath. >>>> So oops here might be another manifestation of use-after-free. >>>> >>>> Do you always hit the oops at the same address? >>> >>> I think we saw this bug the first time. But before that the scsi >>> logging level was higher. Gonzalo is trying to recreate it with >>> the same (old) scsi logging level. >>> Afterwards we will try with barrier=0. >>> >>> Both on v3.0.7 btw. >>> >>>> Could you find corresponding source code line for >>>> the crashed address, dm_softirq_done+0x72/0x140, >>>> and which pointer was invalid? >>> >>> It crashes in the inlined function dm_done() when trying to >>> dereference tio (aka clone->end_io_data): >>> >>> static void dm_done(struct request *clone, int error, bool mapped) >>> { >>> int r = error; >>> struct dm_rq_target_io *tio = clone->end_io_data; >>> dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io; >> >> Thank you. But, hmm. I have no idea about scenario. >> >> struct dm_rq_target_io is a container of clone request >> and clone->end_io_data points to its container. >> >> struct dm_rq_target_io { >> struct mapped_device *md; >> struct dm_target *ti; >> struct request *orig, clone; >> int error; >> union map_info info; >> }; >> >> If clone can be dereferenced, clone->end_io_data should be, too. > > If it helps: the above *ti pointer is the only one that points to > an (invalid) vmalloc area address. Invalid means the page was unmapped > because it was freed because of DEBUG_PAGEALLOC. > All other addresses I followed to get to this one belong to > the 1:1 mapping of the kernel, so no vmalloc involved. Thanks, ok it was ti which was invalid. Not tio. ti is a pointer to dm table entry, which is vmalloc-ed. So it means the dm table was replaced while I/O was in-flight. dm has a machanism to prevent it: in dm_suspend(), stop_queue() is called to stop block queue processing and no new I/O becomes in-flight after that. Then all in-flight I/Os are waited to be completed or requeued (dm_wait_for_completion()). If the wait was successful, the table can become "suspended", i.e. ready to be replaced. So ti should be always valid. Hmm.. -- Jun'ichi Nomura, NEC Corporation -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html