Hi Please send me files mm/mempool.o and driver/md/*.o from these two kernels, that crashed with these oopses. So that I can see more precisely, where it happened. Mikulas On Tue, 21 Oct 2008, aluno3@xxxxxxxxxxxxxx wrote: > Hi Milan, > > Thanks for the patch. I've applied it on 2.6.27 but it looks like we're > still having the same problem. We've tested it on both 32 and 64 bit > kernels - and on both of them the problem occurs, but in different way. > > Here there are calltraces from both kernels (32 and 64 bit): > > > 32 bit one: > > BUG: unable to handle kernel paging request at 08048000 > IP: [<c05263f9>] _spin_lock_irqsave+0x9/0x20 > *pdpt = 000000000c438001 *pde = 000000007f997067 > Oops: 0003 [#1] SMP > Modules linked in: sg st iscsi_trgt drbd bonding iscsi_tcp libiscsi > scsi_transport_iscsi 3w_9xxx sata_nv forcedeth button ftdi_sio usbserial > > Pid: 30618, comm: kcopyd Not tainted (2.6.27-32#1) > EIP: 0060:[<c05263f9>] EFLAGS: 00010097 CPU: 0 > EIP is at _spin_lock_irqsave+0x9/0x20 > EAX: 08048000 EBX: 08048000 ECX: 00000297 EDX: 00000100 > ESI: eb602148 EDI: eb53cb40 EBP: 00000000 ESP: f11f7ea0 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process kcopyd (pid: 30618, ti=f11f6000 task=f5065580 task.ti=f11f6000) > Stack: c015a57f eb602148 eb641d08 eb53cb40 c044ca4a eb641d08 00000000 > f29a5080 > c044cb07 00000002 00000000 eb8dd540 00000000 c044ddb3 00057803 > 00000000 > 000001f5 00000000 f2728108 f29a5080 00000000 c044cba0 f2728108 > ed016370 > Call Trace: > [<c015a57f>] mempool_free+0x1f/0x70 > [<c044ca4a>] put_pending_exception+0x5a/0x60 > [<c044cb07>] pending_complete+0xb7/0x110 > [<c044ddb3>] persistent_commit+0xe3/0x110 > [<c044cba0>] copy_callback+0x30/0x40 > [<c0447d04>] segment_complete+0x154/0x1d0 > [<c0447935>] run_complete_job+0x45/0x80 > [<c0447bb0>] segment_complete+0x0/0x1d0 > [<c04478f0>] run_complete_job+0x0/0x80 > [<c0447af4>] process_jobs+0x14/0x70 > [<c0447b50>] do_work+0x0/0x40 > [<c0447b66>] do_work+0x16/0x40 > [<c013509d>] run_workqueue+0x4d/0xf0 > [<c01351bd>] worker_thread+0x7d/0xc0 > [<c0138350>] autoremove_wake_function+0x0/0x30 > [<c0524efc>] __sched_text_start+0x1ec/0x4b0 > [<c0138350>] autoremove_wake_function+0x0/0x30 > [<c0121a9b>] complete+0x2b/0x40 > [<c0135140>] worker_thread+0x0/0xc0 > [<c0137e24>] kthread+0x44/0x70 > [<c0137de0>] kthread+0x0/0x70 > [<c0104c57>] kernel_thread_helper+0x7/0x10 > ======================= > Code: 89 c8 c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 90 83 28 01 > 79 05 e8 25 ff ff ff c3 8d 74 26 00 9c 59 fa ba 00 01 00 00 90 <66> 0f > c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 8d b6 00 00 > EIP: [<c05263f9>] _spin_lock_irqsave+0x9/0x20 SS:ESP 0068:f11f7ea0 > ---[ end trace b3493777a8378781 ]--- > > > > 64 bit one: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<0000000000000000>] 0x0 > PGD 6e88e067 PUD 53f6f067 PMD 0 > Oops: 0000 [1] SMP > CPU 1 > Modules linked in: iscsi_trgt drbd bonding iscsi_tcp libiscsi > scsi_transport_iscsi megaraid_mbox megaraid_mm sky2 skge button ftdi_sio > usbserial > Pid: 13724, comm: kcopyd Not tainted 2.6.27-64#3 > RIP: 0010:[<0000000000000000>] [<0000000000000000>] 0x0 > RSP: 0000:ffff880000b83d18 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: ffff88002626fba8 RCX: 0000000000000001 > RDX: ffff8800761d4208 RSI: 8000000000000000 RDI: ffff88002626fba8 > RBP: ffff8800399e4000 R08: ffffc20005e1e130 R09: 00ffffffffffffff > R10: 0100000000000000 R11: 0000000000000000 R12: ffff880030087c88 > R13: 0000000000000000 R14: ffff88002c14f440 R15: ffff8800399e4118 > FS: 0000000000000000(0000) GS:ffff88007f473dc0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000002a7d9000 CR4: 00000000000006a0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process kcopyd (pid: 13724, threadinfo ffff880000b82000, task > ffff88007f58cf30) > Stack: ffffffff805c2eae 0000000000000400 0000000000000001 ffff8800561a55c0 > 0000000000000000 ffff880038c1f978 0000000000000400 0000000000000000 > ffffffff805c4130 0000000000001425 000000000000062a 0000000000000082 > Call Trace: > [<ffffffff805c2eae>] ? pending_complete+0x1ee/0x230 > [<ffffffff805c4130>] ? persistent_commit+0xe0/0x130 > [<ffffffff805bd8a3>] ? segment_complete+0x183/0x1c0 > [<ffffffff805bd720>] ? segment_complete+0x0/0x1c0 > [<ffffffff805bd385>] ? run_complete_job+0x65/0xb0 > [<ffffffff805bd320>] ? run_complete_job+0x0/0xb0 > [<ffffffff805bd5d6>] ? process_jobs+0x26/0xe0 > [<ffffffff805bd690>] ? do_work+0x0/0x60 > [<ffffffff805bd6b8>] ? do_work+0x28/0x60 > [<ffffffff8024686a>] ? run_workqueue+0x5a/0x110 > [<ffffffff802469bc>] ? worker_thread+0x9c/0xf0 > [<ffffffff8024a620>] ? autoremove_wake_function+0x0/0x30 > [<ffffffff8024a620>] ? autoremove_wake_function+0x0/0x30 > [<ffffffff80246920>] ? worker_thread+0x0/0xf0 > [<ffffffff80249f0c>] ? kthread+0x6c/0xa0 > [<ffffffff8020d1c9>] ? child_rip+0xa/0x11 > [<ffffffff8021b5f0>] ? lapic_next_event+0x0/0x10 > [<ffffffff80249ea0>] ? kthread+0x0/0xa0 > [<ffffffff8020d1bf>] ? child_rip+0x0/0x11 > > > Code: Bad RIP value. > RIP [<0000000000000000>] 0x0 > RSP <ffff880000b83d18> > CR2: 0000000000000000 > ---[ end trace 03b26540ec781e73 ]--- > > > > Any other suggestions? > > Best > > Milan Broz wrote: > > aluno3@xxxxxxxxxxxxxx wrote: > > > >> I've got this calltrace from our QA team. They say that they mad few > >> snapshots, run several programs like bacula or rsync and that calltrace > >> is appearing about 1 hour after starting those programs. > >> > > > > Hi, > > if it is reproducible, please can you try if this patch helps? > > http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-snapshot-fix-primary_pe-race.patch > > > > Probably the same problem reported here > > http://bugzilla.kernel.org/show_bug.cgi?id=11636 > > > > (Added Mikulas to CC) > > > > Milan > > -- > > mbroz@xxxxxxxxxx > > > > -- > > dm-devel mailing list > > dm-devel@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/dm-devel > > > > > > -- > dm-devel mailing list > dm-devel@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/dm-devel > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel