Re: fsck failure at boot

Jason Dixon <jason@xxxxxxxxxxxxxx> · Fri, 21 Apr 2006 18:43:39 -0400

On Apr 21, 2006, at 6:32 PM, Herta Van den Eynde wrote:

Well, the relevance of SANsurfer would depend on what the problem  
is. When groping in the dark, it'd be one of the places I'd look  
for indications.  But upon re-reading your initial post, I agree  
that chances are slim that the HBA is the root cause of your problem.

You mentioned RHAS4.  Are you using a standard Red Hat kernel, or  
did you built your own?  (Reason I ask is that I want to exclude an  
initial ram disk that doesn't know about your QLogic HBA.)

This is a standard RHAS 2.6.* ia64 kernel.

I'm a bit confused by the "*** An error occurred during the file  
system check" error message you mentioned in your first mail.  I  
expect that to be generated by /etc/rc.d/rc.sysinit, not by  
fsck.ext3.  (Might be a cut-n-paste to the wrong portion of the  
mail body?)  Note that there are two locations in that script that  
can generate that error: once while the root filesystem is mounted  
read-only, and again after lvm2 initialization.

It was copy/pasted straight out of the console.  It *is* possible  
that there is a paste error, as I grabbed sections at a time.

The complaint about the superblock problem can be ignored, in as  
far as the superblock must be correct - as is evident from the fact  
that you can mount the partition just fine when the system is fully  
booted.
(Assuming that /dev/sdl1 doesn exist, a "fsck.ext3 -a /dev/sdl1"  
will generate the same error.)

[root@altix ~]# fsck.ext3 -a /dev/sdl1
fsck.ext3: No such file or directory while trying to open /dev/sdl1
/dev/sdl1:
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate  
superblock:
    e2fsck -b 8193 <device>

But combined with the error "fsck.ext3: No such file or directory  
while trying to open /dev/sdb1", it looks like the device special  
filename /dev/sdb1 hasn't been created yet at the time you're  
trying to use it.

Do dmesg or /var/log/messages contain additional information?

The dmesg is at .  Here is a seemingly relevant section from /var/log/ 
messages.  I'm not sure why the timestamps are all jumbled, but  
assuming the entries are logged sequentially, udev certainly appears  
to load the device before fsck.ext3 is called.  Note, this is for a  
boot with the /dev/sdb1 entry commented out in /etc/fstab.

Apr 20 19:32:24 altix kernel: SELinux:  Initializing.
Apr 20 19:32:24 altix kernel: SELinux:  Starting in permissive mode
Apr 20 19:32:24 altix kernel: There is already a security framework  
initialized, register_security failed.
Apr 20 15:32:10 altix start_udev: Starting udev:  succeeded
Apr 20 19:32:25 altix kernel: selinux_register_security:  Registering  
secondary module capability
Apr 20 15:32:13 altix udevsend[704]: starting udevd daemon
Apr 20 19:32:25 altix kernel: Capability LSM initialized as secondary
Apr 20 15:32:14 altix scsi.agent[753]: disk at /devices/ 
pci0000:02/0000:02:01.0/host4/target4:0:0/4:0:0:0
Apr 20 19:32:25 altix kernel: Mount-cache hash table entries: 1024  
(order: 0, 16384 bytes)
Apr 20 15:32:14 altix scsi.agent[760]: disk at /devices/ 
pci0000:02/0000:02:01.0/host4/target4:0:0/4:0:0:1
Apr 20 19:32:25 altix kernel: Boot processor id 0x0/0x0
Apr 20 15:32:16 altix rc.sysinit: -e
Apr 20 19:32:25 altix kernel: task migration cache decay timeout: 10  
msecs.
Apr 20 19:32:25 altix rpcidmapd: rpc.idmapd startup succeeded
Apr 20 15:32:16 altix sysctl: net.ipv4.ip_forward = 0
Apr 20 15:32:16 altix sysctl: net.ipv4.conf.default.rp_filter = 1
Apr 20 19:32:25 altix netfs: Mounting other filesystems:  succeeded
Apr 20 15:32:16 altix sysctl:  
net.ipv4.conf.default.accept_source_route = 0
Apr 20 15:32:16 altix sysctl: kernel.sysrq = 0
Apr 20 19:32:26 altix autofs: automount startup succeeded
Apr 20 15:32:16 altix sysctl: kernel.core_uses_pid = 1
Apr 20 15:32:16 altix rc.sysinit: Configuring kernel parameters:   
succeeded
Apr 20 19:32:26 altix smartd[1386]: smartd version 5.33 [ia64-redhat- 
linux-gnu] Copyright (C) 2002-4 Bruce Allen
Apr 20 19:32:16 altix date: Thu Apr 20 19:32:16 EDT 2006
Apr 20 19:32:16 altix rc.sysinit: Setting clock  (localtime): Thu Apr  
20 19:32:16 EDT 2006 succeeded
Apr 20 19:32:26 altix smartd[1386]: Home page is http:// 
smartmontools.sourceforge.net/
Apr 20 19:32:27 altix smartd[1386]: Opened configuration file /etc/ 
smartd.conf
Apr 20 19:32:26 altix kernel: Brought up 4 CPUs
Apr 20 19:32:16 altix rc.sysinit: Setting hostname altix.raba.com:   
succeeded
Apr 20 19:32:27 altix smartd[1386]: Configuration file /etc/ 
smartd.conf parsed.
Apr 20 19:32:17 altix fsck: [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a / 
dev/VolGroup00/LogVol00
Apr 20 19:32:17 altix fsck: /dev/VolGroup00/LogVol00: clean,  
293377/9781248 files, 2273903/19546112 blocks

Is this a system you can take down for testing?  If so, could you
- edit rc.sysinit to slightly change one of the two "*** An error  
occurred during the file system check" error messages, to determine  
which of the two locations actually causes the error?
- reboot again, and when you're dropped to the shell,
  - manually check whether the device special file
    /dev/sdb1 exists or not
  - manually execute the checks in rc.sysinit prior
    to the error message to determine which one fails

Yes, I should be able to attempt this on Monday.  I will follow up  
with details.

Thanks,

--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list