On Sat, Oct 18, 2008 at 04:20:13PM -0700, Curtis Doty wrote: > 4:29pm Theodore Tso said: > >> On Sat, Oct 18, 2008 at 12:55:56PM -0700, Curtis Doty wrote: >>> While attempting to expand a 1.64T ext4 volume to 2.18T the F9 kernel >>> deadlocked. (I have photo of screen/oops if anybody's interested.) >> >> Yes, that would be useful, thanks. > > Three photos of same: http://www.greenkey.net/~curtis/linux/ > > The rest had scrolled off, so maybe that soft lockup was a secondary > effect rather than true cause? It was re-appearing every minute. Looks like the kernel wedged due to running out of memory. The calls to shrink_zone(), shrink_inactive_list(), try_to_release_page(), etc. tends to indicate that the system was frantically trying to find free physical memory at the time. It may or may not have been caused by the online resize; how much memory does your system have, and what else was going on at the time? It may have been that something *else* had been leaking memory at the time, and this pushed it over the line. It's also the case that the online resize is journaled, so it should have been safe; but I'm guessing that the system was thrashing so hard, and you didn't have barriers enabled, and this resulted in the filesystem getting corrupted. >> Hmm... This sounds like the needs recovery flag was set on the backup >> superblock, which should never happen. Before we try something more >> extreme, see if this helps you: >> >> e2fsck -b 32768 -B 4096 /dev/where-inst-is-located >> >> That forces the use of the backup superblock right away, and might >> help you get past the initial error. > > Same as before. :-( > > # e2fsck -b32768 -B4096 -C0 /dev/dat/inst > e2fsck 1.41.0 (10-Jul-2008) > inst: recovering journal > e2fsck: unable to set superblock flags on inst > > It appears *all* superblocks are same as that first 32768 by iterating > over all superblocks shown in mkfs -n output says so. > > I'm inclined to just force reduce the underlying lvm. It was 100% full > before I extended and tried to resize. And I know the only writes on the > new lvm extent would have been from resize2fs. It that wise? No, force reducing the underlying LVM is only going to make things worse, since it doesn't fix the filesystem. So this is what I would do. Create a snapshot and try this on the snapshot first: % lvcreate -s -L 10G -n inst-snapshot /dev/dat/inst % debugfs -w /dev/dat/inst-snapshot debugfs: features ^needs_recovery debugfs: quit % e2fsck -C 0 /dev/dat/inst This will skip running the journal, but there's no guarantee the journal is valid anyway. If this turns into a mess, you can throw away the snapshot and try something else. (The something else would require writing a C program that removes the needs_recovery from all the backup superblock, but keeping it set on the master superbock. That's more work, so let's try this way first.) - Ted _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users