Re: Snapshot causing segault

Tyler Gates <tyler.gates@ats.coop> · Thu, 3 Jan 2013 08:25:49 -0500

On Thu, Jan 3, 2013 at 5:18 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:

Dne 31.12.2012 19:50, Tyler Gates napsal(a):

Hello everyone,

      I've been having an intermittent problem on random servers segfaulting

while trying to create a snapshot under version  lvm2-2.02.17-7.38.3 on

kernel 2.6.16.60-0.93.1-bigsmp (SLES 10 SP4). The messages I get are:

###########################################

Dec 27 07:45:39 chelco-app-01 kernel: Unable to handle kernel NULL pointer

dereference at virtual address 0000001c

Dec 27 07:45:39 chelco-app-01 kernel:  printing eip:

Dec 27 07:45:39 chelco-app-01 kernel: f90ab3a7

Dec 27 07:45:39 chelco-app-01 kernel: *pde = 3780a001

Dec 27 07:45:39 chelco-app-01 kernel: Oops: 0000 [#1]

Dec 27 07:45:39 chelco-app-01 kernel: SMP

Dec 27 07:45:39 chelco-app-01 kernel: last sysfs file:

/devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq

Dec 27 07:45:39 chelco-app-01 kernel: Modules linked in: raw dock button

battery ac loop dm_snapshot usbhid dm_mod uhci_hcd bnx2x hw_random ehci_hcd

qla2xxx hpilo usbcore firmware_class scsi_transport_fc parport_pc lp parport

ext3 jbd edd

fan thermal processor cciss sd_mod scsi_mod

Dec 27 07:45:39 chelco-app-01 kernel: CPU:    4

Dec 27 07:45:39 chelco-app-01 kernel: EIP:    0060:[<f90ab3a7>]    Tainted: G

     X VLI

Dec 27 07:45:39 chelco-app-01 kernel: EFLAGS: 00210202

(2.6.16.60-0.93.1-bigsmp #1)

Dec 27 07:45:39 chelco-app-01 kernel: EIP is at __map_bio+0x50/0x11f [dm_mod]

Dec 27 07:45:39 chelco-app-01 kernel: eax: f90960c4   ebx: 00000000   ecx:

f7ff2a60   edx: f7794440

Dec 27 07:45:39 chelco-app-01 kernel: esi: f7ff2a58   edi: f90960c4   ebp:

f46306c0   esp: f4c15d28

Dec 27 07:45:39 chelco-app-01 kernel: ds: 007b   es: 007b   ss: 0068

Dec 27 07:45:39 chelco-app-01 kernel: Process lvcreate (pid: 6678,

threadinfo=f4c14000 task=f7838680)

Dec 27 07:45:39 chelco-app-01 kernel: Stack: <0>f7794340 f7794440 f7794440

03201ff0 00000000 03201ff0 00000000 00000008

Dec 27 07:45:39 chelco-app-01 kernel:        00000000 00000000 f90960c4

f7ff2a68 f46306c0 f90abd1b 00000000 00000001

Dec 27 07:45:39 chelco-app-01 kernel:        00000008 f428e2e0 fcdfe010

ffffffff c0113d62 00000000 0000001f f7ff2a58

Dec 27 07:45:39 chelco-app-01 kernel: Call Trace:

Dec 27 07:45:39 chelco-app-01 kernel:  [<f90abd1b>] __split_bio+0x182/0x440

[dm_mod]

Dec 27 07:45:39 chelco-app-01 kernel:  [<c0113d62>] do_flush_tlb_all+0x0/0x5d

Dec 27 07:45:39 chelco-app-01 kernel:  [<f90abff0>]

__flush_deferred_io+0x17/0x20 [dm_mod]

Dec 27 07:45:39 chelco-app-01 kernel:  [<f90ac14c>] dm_resume+0x8e/0xf9 [dm_mod]

Dec 27 07:45:39 chelco-app-01 kernel:  [<f90aedd8>] dev_suspend+0x138/0x157

[dm_mod]

Dec 27 07:45:39 chelco-app-01 kernel:  [<f90af607>] ctl_ioctl+0x220/0x26e [dm_mod]

Dec 27 07:45:39 chelco-app-01 kernel:  [<f90aeca0>] dev_suspend+0x0/0x157 [dm_mod]

Dec 27 07:45:39 chelco-app-01 kernel:  [<c0179ce8>] do_ioctl+0x48/0x5e

Dec 27 07:45:39 chelco-app-01 kernel:  [<c0179f60>] vfs_ioctl+0x262/0x275

Dec 27 07:45:39 chelco-app-01 kernel:  [<c0179fc7>] sys_ioctl+0x54/0x6d

Dec 27 07:45:39 chelco-app-01 kernel:  [<c0103dcb>] sysenter_past_esp+0x54/0x79

Dec 27 07:45:39 chelco-app-01 kernel: Code: b4 0a f9 89 70 40 8b 06 83 c0 0c

f0 ff 00 8b 54 24 08 8d 4e 08 8b 02 8b 52 04 89 44 24 0c 89 f8 89 54 24 10 8b

5f 04 8b 54 24 08 <ff> 53 1c 83 f8 00 89 c2 0f 8e 93 00 00 00 8b 54 24 08 8b 42 0c

#############################################################

The result is the target volume gets suspended and the only way to fix it is

to reboot and remove the faulty snapshot when it comes back up.

Now the script I wrote that creates these snapshots will use all available

extents from the Volume Group pool which in this case was actually larger than

the size of the volume I was trying to snapshot. Thinking this was the

problem, I tried creating the snapshot several times using a snapshot size

less than or equal to the target volume and it worked every time. So, I tried

a value larger than the target to generate a crash and it did BUT not every

time. In fact now I can't get it to segfault at all.

So my question is: is creating the snapshot volume with a size larger than the

target volume inducing segfaults randomly or could there be another problem

lurking? If these weren't production machines I would normally just go with a

size smaller than the target but I really need to be sure what exactly is

causing the segfaults.

Any help would be appreciated.

Any special reason to use lvm2 from the year 2006 in the year 2013 ?

Yes. It is from a specific version of an OS we tested as being stable back in the day, which unfortunately uses older software such as this LVM version. It wasn't until recently that I wanted to start using LVM.

There is no big point in fixing some particular bugs any many years obsoleted source code.

Can you try to use/rebuild more recent version?

I realize trying a more recent version would be the best thing to do assuming it would be easy (in this situation it would be a big hassle) but I was hoping someone could tell me either "yes over allocating to the snapshot could cause this" or  "it sounds like a bug in that version" before I go through all that trouble.

Zdenek

_______________________________________________

linux-lvm mailing list

linux-lvm@redhat.com

https://www.redhat.com/mailman/listinfo/linux-lvm

read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

-- 
Tyler Gates
ATS | Sr. Systems
Administrator
Tyler.Gates@ats.coop
The Power of One
Software Solution - OpenOne

 910.210.4100 main  | 
910.210.4150 fax |  910.210.4118 direct |  910.358.3063
mobile | 

This email may contain information that is confidential or attorney-client privileged and may constitute inside

information. The contents of this email are intended only for the recipient(s) listed above.  If you are not the

intended recipient, you are directed not to read, disclose, distribute or otherwise use this transmission.  If you

received this email in error, please notify the sender immediately and delete the transmission.  Delivery of the

message is not intended to waive any applicable privileges.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/