Re: Calltrace in dm-snapshot in 2.6.27 kernel

"aluno3@xxxxxxxxxxxxxx" <aluno3@xxxxxxxxxxxxxx> · Tue, 21 Oct 2008 08:39:13 +0200

Hi Milan,

Thanks for the patch. I've applied it on 2.6.27 but it looks like we're
still having the same problem. We've tested it on both 32 and 64 bit
kernels - and on both of them the problem occurs, but in different way.

Here there are calltraces from both kernels (32 and 64 bit):

32 bit one:

BUG: unable to handle kernel paging request at 08048000
IP: [<c05263f9>] _spin_lock_irqsave+0x9/0x20
*pdpt = 000000000c438001 *pde = 000000007f997067
Oops: 0003 [#1] SMP
Modules linked in: sg st iscsi_trgt drbd bonding iscsi_tcp libiscsi
scsi_transport_iscsi 3w_9xxx sata_nv forcedeth button ftdi_sio usbserial

Pid: 30618, comm: kcopyd Not tainted (2.6.27-32#1)
EIP: 0060:[<c05263f9>] EFLAGS: 00010097 CPU: 0
EIP is at _spin_lock_irqsave+0x9/0x20
EAX: 08048000 EBX: 08048000 ECX: 00000297 EDX: 00000100
ESI: eb602148 EDI: eb53cb40 EBP: 00000000 ESP: f11f7ea0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kcopyd (pid: 30618, ti=f11f6000 task=f5065580 task.ti=f11f6000)
Stack: c015a57f eb602148 eb641d08 eb53cb40 c044ca4a eb641d08 00000000
f29a5080
       c044cb07 00000002 00000000 eb8dd540 00000000 c044ddb3 00057803
00000000
       000001f5 00000000 f2728108 f29a5080 00000000 c044cba0 f2728108
ed016370
Call Trace:
 [<c015a57f>] mempool_free+0x1f/0x70
 [<c044ca4a>] put_pending_exception+0x5a/0x60
 [<c044cb07>] pending_complete+0xb7/0x110
 [<c044ddb3>] persistent_commit+0xe3/0x110
 [<c044cba0>] copy_callback+0x30/0x40
 [<c0447d04>] segment_complete+0x154/0x1d0
 [<c0447935>] run_complete_job+0x45/0x80
 [<c0447bb0>] segment_complete+0x0/0x1d0
 [<c04478f0>] run_complete_job+0x0/0x80
 [<c0447af4>] process_jobs+0x14/0x70
 [<c0447b50>] do_work+0x0/0x40
 [<c0447b66>] do_work+0x16/0x40
 [<c013509d>] run_workqueue+0x4d/0xf0
 [<c01351bd>] worker_thread+0x7d/0xc0
 [<c0138350>] autoremove_wake_function+0x0/0x30
 [<c0524efc>] __sched_text_start+0x1ec/0x4b0
 [<c0138350>] autoremove_wake_function+0x0/0x30
 [<c0121a9b>] complete+0x2b/0x40
 [<c0135140>] worker_thread+0x0/0xc0
 [<c0137e24>] kthread+0x44/0x70
 [<c0137de0>] kthread+0x0/0x70
 [<c0104c57>] kernel_thread_helper+0x7/0x10
 =======================
Code: 89 c8 c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 90 83 28 01
79 05 e8 25 ff ff ff c3 8d 74 26 00 9c 59 fa ba 00 01 00 00 90 <66> 0f
c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 8d b6 00 00
EIP: [<c05263f9>] _spin_lock_irqsave+0x9/0x20 SS:ESP 0068:f11f7ea0
---[ end trace b3493777a8378781 ]---

64 bit one:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<0000000000000000>] 0x0
PGD 6e88e067 PUD 53f6f067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: iscsi_trgt drbd bonding iscsi_tcp libiscsi
scsi_transport_iscsi megaraid_mbox megaraid_mm sky2 skge button ftdi_sio
usbserial
Pid: 13724, comm: kcopyd Not tainted 2.6.27-64#3
RIP: 0010:[<0000000000000000>]  [<0000000000000000>] 0x0
RSP: 0000:ffff880000b83d18  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88002626fba8 RCX: 0000000000000001
RDX: ffff8800761d4208 RSI: 8000000000000000 RDI: ffff88002626fba8
RBP: ffff8800399e4000 R08: ffffc20005e1e130 R09: 00ffffffffffffff
R10: 0100000000000000 R11: 0000000000000000 R12: ffff880030087c88
R13: 0000000000000000 R14: ffff88002c14f440 R15: ffff8800399e4118
FS:  0000000000000000(0000) GS:ffff88007f473dc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000002a7d9000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kcopyd (pid: 13724, threadinfo ffff880000b82000, task
ffff88007f58cf30)
Stack:  ffffffff805c2eae 0000000000000400 0000000000000001 ffff8800561a55c0
 0000000000000000 ffff880038c1f978 0000000000000400 0000000000000000
 ffffffff805c4130 0000000000001425 000000000000062a 0000000000000082
Call Trace:
 [<ffffffff805c2eae>] ? pending_complete+0x1ee/0x230
 [<ffffffff805c4130>] ? persistent_commit+0xe0/0x130
 [<ffffffff805bd8a3>] ? segment_complete+0x183/0x1c0
 [<ffffffff805bd720>] ? segment_complete+0x0/0x1c0
 [<ffffffff805bd385>] ? run_complete_job+0x65/0xb0
 [<ffffffff805bd320>] ? run_complete_job+0x0/0xb0
 [<ffffffff805bd5d6>] ? process_jobs+0x26/0xe0
 [<ffffffff805bd690>] ? do_work+0x0/0x60
 [<ffffffff805bd6b8>] ? do_work+0x28/0x60
 [<ffffffff8024686a>] ? run_workqueue+0x5a/0x110
 [<ffffffff802469bc>] ? worker_thread+0x9c/0xf0
 [<ffffffff8024a620>] ? autoremove_wake_function+0x0/0x30
 [<ffffffff8024a620>] ? autoremove_wake_function+0x0/0x30
 [<ffffffff80246920>] ? worker_thread+0x0/0xf0
 [<ffffffff80249f0c>] ? kthread+0x6c/0xa0
 [<ffffffff8020d1c9>] ? child_rip+0xa/0x11
 [<ffffffff8021b5f0>] ? lapic_next_event+0x0/0x10
 [<ffffffff80249ea0>] ? kthread+0x0/0xa0
 [<ffffffff8020d1bf>] ? child_rip+0x0/0x11

Code:  Bad RIP value.
RIP  [<0000000000000000>] 0x0
 RSP <ffff880000b83d18>
CR2: 0000000000000000
---[ end trace 03b26540ec781e73 ]---

Any other suggestions?

Best

Milan Broz wrote:
> aluno3@xxxxxxxxxxxxxx wrote:
>   
>> I've got this calltrace from our QA team. They say that they mad few
>> snapshots, run several programs like bacula or rsync and that calltrace
>> is appearing about 1 hour after starting those programs.
>>     
>
> Hi,
> if it is reproducible, please can you try if this patch helps?
> http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-snapshot-fix-primary_pe-race.patch
>
> Probably the same problem reported here
> http://bugzilla.kernel.org/show_bug.cgi?id=11636
>
> (Added Mikulas to CC)
>
> Milan
> --
> mbroz@xxxxxxxxxx
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>   

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel