snapshot error with xfs and disk I/O

Wim Bakker <wim@unetix.nl> · Thu, 30 Mar 2006 09:09:22 +0100

Hello ,

There seem to be serious problems with snapshots , lvm2 and xfs.
As soon as there is a slight amount of disk I/O during snapshotting
a logical volume with xfs , the following kind of kernel panic occurs:
--------------------------------------------------------------------------------------------
root@test.cashnet.nl [/root]# umount /backup
root@test.cashnet.nl [/root]# lvremove -f /dev/data/dbackup
Segmentation fault

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: Oops: 0000 [#1]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: SMP

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: CPU:    0

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: EIP is at exit_exception_table+0x48/0x8e [dm_snapshot]
root@test.cashnet.nl [/root]#
Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: eax: 00000000   ebx: e0b62c70   ecx: 00000000   edx: dfbdaf40

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: esi: 00000000   edi: dfbdaf40   ebp: 00001c70   esp: cdfb9e9c

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: ds: 007b   es: 007b   ss: 0068

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: Process lvremove (pid: 14480, threadinfo=cdfb8000 task=df97aa90)

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: Stack: <0>dfbdaf40 d03cbf88 00002000 0000038e db2fa40c db2fa3c0 
e0ade080 00000040

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:        00000001 e0ab098f db2fa40c dfbdaf40 e0ade080 df4c1480 
e0abc13b e0ade080

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:        dc276d80 df4c1480 00000004 080e2888 e0abb5ed df4c1480 
df4c1480 c9ff2440

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: Call Trace:

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<e0ab098f>] snapshot_dtr+0x33/0x7c [dm_snapshot]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<e0abc13b>] table_destroy+0x5b/0xbf [dm_mod]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<e0abb5ed>] dm_put+0x4c/0x72 [dm_mod]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<e0abe286>] __hash_remove+0x82/0xb1 [dm_mod]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<e0abec26>] dev_remove+0x3b/0x85 [dm_mod]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<e0abfc82>] ctl_ioctl+0xde/0x141 [dm_mod]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<e0abebeb>] dev_remove+0x0/0x85 [dm_mod]

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<c0176e63>] do_ioctl+0x6f/0xa9

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<c0177046>] vfs_ioctl+0x65/0x1e1

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<c0177247>] sys_ioctl+0x85/0x92

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel:  [<c0102cd9>] syscall_call+0x7/0xb

Message from syslogd@test at Thu Mar 30 09:27:37 2006 ...
test kernel: Code: 83 c2 01 39 54 24 0c 89 54 24 08 7d 4d 8b 50 04 31 ed 8d 1c 
2a 8b 03 39 d8 8b 30 74 1b 89 44 24 04 89 3c 24 e8 bd 00 6b df 89 f0 <8b> 36 
39 d8 75 ec 8b 44 24 10 8b 50 04 83 44 24 0c 01 8b 44 24
----------------------------------------------------------------------------------------------------------------
The system contains two disks , each 80 Gb , with two volume groups :
    PV /dev/md3    VG data     lvm2 [55.30 GB / 4.52 GB free]
  PV /dev/sda3   VG shares   lvm2 [9.32 GB / 0    free]
  PV /dev/sdb3   VG shares   lvm2 [9.32 GB / 3.02 GB free]
  Total: 3 [73.94 GB] / in use: 3 [73.94 GB] / in no VG: 0 [0   ]

one vg is created with a pv of a software raid device , /dev/md3
the other on a pv consisting of two partitions on each disk.
Both have a lv of the same name, data and shares.
>From each logical volume every ten minutes a snapshot was taken
from cron , meanwhile I was running a script that caused increasing disk I/O
very slowly. After two days running , the following happened :
6:50am  up 2 days 19:13,  1 user,  load average: 2.53, 3.06, 4.27
---------------
  Logical volume "dbackup" already exists in volume group "data"
mount: /dev/data/dbackup already mounted or /backup busy
mount: according to mtab, /dev/mapper/data-dbackup is already mounted 
on /backup
  Can't remove open logical volume "dbackup"
---------------
The script couldn't do anything anymore with the dbackup snapshot (snapshot
of the data LV).
I stopped the script and unmounted manually /backup whereafter I gave
the command :
lvremove -f /dev/data/dbackup and then the kernel panic , as shown above
happened. The same happened on the original server , that has an areca
hw raid controller , the snapshotting of a LV with xfs goes fine , until at a 
certain point when moderate disk I/O happens , then the kernel panics
and oopses out of service.
Are there patches to fix this problem?

TIA

sincerely
Wim bakker

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/