This is related to a thead I saw in the archives, but I couldn't figure out a way to reply to it. A copy of the last message in the thread is included at the bottom of this email for reference. I have two ext3 filesystems running on LVM2 on software RAID 1. I am seeing occasional (three times in about a month), seemingly random filesystem corruption on both filesystems. It matches the pattern reported by "Gumby" in the thread below. Here's the log of the latest one: Jan 21 09:50:25 castrovalva kernel: attempt to access beyond end of device Jan 21 09:50:25 castrovalva kernel: dm-0: rw=0, want=7011473768, limit=20971520 Jan 21 09:50:25 castrovalva kernel: attempt to access beyond end of device Jan 21 09:50:25 castrovalva kernel: dm-0: rw=0, want=26847680, limit=20971520 Jan 21 09:50:25 castrovalva kernel: EXT3-fs error (device dm-0): ext3_readdir: bad entry in directory #1179649: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 Jan 21 09:50:25 castrovalva kernel: Aborting journal on device dm-0. Jan 21 09:50:25 castrovalva kernel: __journal_remove_journal_head: freeing b_committed_data Jan 21 09:50:25 castrovalva kernel: ext3_abort called. Jan 21 09:50:25 castrovalva kernel: EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal Jan 21 09:50:25 castrovalva kernel: Remounting filesystem read-only As with the other occasions, unmounting and running e2fsck recovered the filesystem. System is Debian testing, kernel 2.6.15 (but the problem was also seen on a 2.6.14 kernel). lvm2 package is 2.01.04-5 (with lvm-common version 1.5.20). Nothing below the ext3 layer ever reports a problem. There are no LVM, RAID, or low level hard drive IO errors in the logs. I use smartmon tools, and all the drives in the system get a SMART short test run once a day and a long test once a week. The latest long test happened just a few hours before the log above. There are no physical problems reported on the hard drives. The corruption is happening on both of the LVM/RAID filesystems, but not on any of the non-LVM/RAID filesystems on the system drive. The two filesystems in question hold very different files. They basically don't share any applications in common, so I find it very unlikely that an application is causing the corruption. One of the filesystems is primarily a maildir mail store. Primary application accessing that drive is courier-imap. The other filesystem contains MP3s. Samba and slimserver are the main applications accessing that filesystem. I haven't been able to bring down the system long enough to run memtest, but I find RAM to be an unlikely culprit. I've been able to build at least three 2.6 kernels with all modules turned on with no problems, which seems improbable with bad RAM. Also, the system drive never seems to get corrupted. The problem started after I rebuilt this system with kernel 2.6 and LVM2. It previously ran for a couple of years on 2.4/LVM1/RAID/ext3 with absolutely no problems. I have no good reason to point to LVM except that it's one of the things that changed, and it's in the right place to cause these symptoms. Does anyone have anything that I can try in order to confirm/rule out LVM? Is there any more system information I can provide? Unless I hear something, my next action will probably be to backup the data and remove the LVM layer (run ext3 directly over RAID 1). I'll run with that for a month or two and see if the problem is still there. --Jeff > -----Original Message----- > From: linux-lvm-bounces redhat com [mailto:linux-lvm-bounces redhat com] > On Behalf Of Terry Rigby > Sent: Friday, December 02, 2005 9:25 AM > To: linux-lvm redhat com > Subject: Re: Need to keep running fsck on LVM > > On December 2, 2005 09:12 am, Erik Ohrnberger wrote: > > grep /dev/hd /var/log/messages /var/log/syslog | more > > Nope, that outputs nothing at all. > > I do however see the following in /vavr/log/messages over and over and > over > again... > > Dec 1 23:12:46 localhost kernel: attempt to access beyond end of device > Dec 1 23:12:46 localhost kernel: dm-0: rw=0, want=6444890144, > limit=905314304 That complaint is from ll_rw_blk.c. There is probably an application corrupting your filesystem, or a filesystem bug. > > Gumby > > _______________________________________________ > linux-lvm mailing list > linux-lvm redhat com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/