On 1/29/15 3:27 PM, Gerard Beekmans wrote: >> -----Original Message----- >> Are you certain that the volume / storage behind dm-9 is in decent shape? >> (i.e. is it really even an xfs filesystem?) > > The question "is it in decent shape" is probably the million dollar question. Right, sorry, I just meant: does this seem like an xfs problem or a storage problem at first glance. > What I do know is this: > > * It's all LVM based > * The first problem partition is /dev/data/srv which in turn is a symlink to /dev/dm-9 > * The second problem partition is /dev/os/opt which in turn is a symlink to /dev/dm-7 > > Both were originally formatted as XFS and /etc/fstab has same. Now I > can' t be sure if the symlinks were always dm-7 and dm-9. > > Comparing what "lvdisplay" tell in terms of block device major & > minor numbers and compare to the dm-* symlinks, they all match up. So > by all accounts it ought to be correct. > > Running xfs_db on those two partitions shows what I understand to be > the "right stuff" aside from an error when it first runs: ok, that's a good datapoint, so it's not woefully scrambled. > # xfs_db /dev/os/opt > Metadata corruption detected at block 0x4e2001/0x200 so at sector 0x4e2001, length 0x200. xfs_db> agf 5 xfs_db> daddr current daddr is 5120001 so it's the 5th AGF which is corrupt. you could try: xfs_db> agf 5 xfs_db> print to see how it looks. > xfs_db: cannot init perag data (117). Continuing anyway. > xfs_db> sb 0 > xfs_db> p > magicnum = 0x58465342 this must not be the one that repair failed on like: > couldn't verify primary superblock - bad magic number !!! because that magicnum is valid. Did this one also fail to repair? > blocksize = 4096 > dblocks = 3133440 > rblocks = 0 > rextents = 0 > uuid = b4ab7d1d-d383-4c49-af2c-be120ff967a7 > logstart = 262148 > rootino = 128 > rbmino = 129 > rsumino = 130 > rextsize = 1 > agblocks = 128000 > agcount = 25 25 ags, presumably the fs was grown in the past, but ok... ... >> A VM crashing definitely should not result in a badly corrupt/unmountable >> filesystem. >> >> Is there any other interesting part of the story? :) > > The full setup is as follows: > > The VM question is a VMware guest running on a vmware cluster. The > actual files that make up the VM is stored on a SAN that VMware > accesses via NFS. > > The outage occurred at the SAN level making the NFS storage > unavailable which in turn turned off all the VMs running on it > (turned off in the virtual sense). > > ~50 VMs then were brought online and none had any serious issues. > Most needed a form of fsck to bring things back to consistency. This > is the only VM that suffered the way it did. Other VMs are a mix of > Linux, BSD, OpenSolaris and Windows with all their varieties of > filesystems (ext3, ext4, xfs, ntfs and so on). > > It is possible that it is the vmware VMDK file that belongs to this > VM that is the issue but it does not appear to be corrupt from a vmdk > standpoint. Just the data inside of it. The only thing I can say is that xfs is going to depend on the storage telling the truth about completed IOs... If the storage told XFS an IO was persistent, but it wasn't, and the storage went poof, bad things can happen. I don't know the details of your setup, or TBH much about vmware over nfs ... you weren't mounted with -o nobarrier were you? -Eric > > Gerard > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs