Re: tune2fs can't be used on a mounted ext4, or...?

Roman Mamedov <rm@xxxxxxxxxx> · Tue, 12 Apr 2011 00:55:26 +0600

On Mon, 11 Apr 2011 09:10:08 -0400
Ted Ts'o <tytso@xxxxxxx> wrote:

> Your symptoms don't sound familiar to me, other than the standard
> concerns about hardware induced file system inconsistency problems.

Thing is, I do not observe any in-file random data corruptions which would
point to a problem at a lower (block-device) level, so I do not think it is a
RAID or HDD problem.

The breakage seemed to be on the filesystem logic level, perhaps something to
do with allocation of space for new files? And since I immediately just before
that, made two operations possibly affecting it (tune2fs stride size + online
grow with resize2fs) that's why I thought this might be an ext4 problem.

While still in the same session, I then re-copied the affected files replacing
their "shortened" copies, and they were written out fine the second time. And
after a reboot, no more file truncations are observed so far.

> Have you checked your logs carefully to make sure there weren't any
> hardware errors reported?

No, there weren't any errors in dmesg, or on the same console where 'cp' would
output its errors.

> If this is a hardware RAID system, is it  regularly doing disk scrubbing?
> Has the hardware RAID reported anything unusual?  How long have you been
> running in a degraded RAID 6 state?

It is an mdadm RAID6, and it does not report any problem. It was running in a
degraded state for only a short time (less than a day). And AFAIK running
degraded without one disk is not a dangerous or risky situation with RAID6.

> And have you tried shutting down the system and running fsck to make
> sure there weren't any file system corruption problems?  When's the
> last time you've run fsck on the system?

I have unmounted it and ran fsck just now. Admittedly there was a long time
since the last fsck.

# e2fsck /dev/md0
e2fsck 1.41.12 (17-May-2010)
/dev/md0 has gone 306 days without being checked, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 367107/364412928 files (4.3% non-contiguous), 1219229259/1457626752
blocks

> If this is an LVM system, I'd strongly suggest that you set aside
> space you can take a snapshot, and then regularly take a snapshot, and
> then run fsck on the snapshot.  If any problems are noted, you can
> then schedule downtime and fsck the entire system.

No, I don't use LVM there.

-- 
With respect,
Roman
Attachment:
signature.asc

Description: PGP signature