We hit this again on one of our VMs. This is running the 3.13 kernel. So, now we have seen this crash on 3.16 and 3.13 kernels. We had another setup with a 3.8 kernel and for several months we haven't seen this problem. Is there a way to narrow down what changed between 3.8 and 3.13 and get to the bottom of this? I had provided info about the workload on a different thread: http://oss.sgi.com/archives/xfs/2015-06/msg00108.html If that doesn't work, let me know and I can get it again. -Shri On Fri, Jun 19, 2015 at 12:37 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: > On 6/19/15 1:34 PM, Shrinand Javadekar wrote: >> I hit this problem again and captured the output of all the steps >> while repairing the filesystem. Here's the crash: >> http://pastie.org/private/prift1xjcc38s0jcvehvew > > that starts with: > > Jun 18 18:40:19 foods-12 kernel: [3639696.006884] ffff8801740f8000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 ................ > Jun 18 18:40:19 foods-12 kernel: [3639696.007056] ffff8801740f8010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 ..... .......... > Jun 18 18:40:19 foods-12 kernel: [3639696.007140] ffff8801740f8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > Jun 18 18:40:19 foods-12 kernel: [3639696.007230] ffff8801740f8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > I think there should have been other interesting bits prior to that line, can you check, and provide it please? Full dmesg in a pastebin would be just fine. > > xfs_attr3_leaf_write_verify at line 216 of file /build/buildd/linux-lts-trusty-3.13.0/fs/xfs/xfs_attr_leaf.c. Caller 0xffffffffa00a193a > > which is ... interesting; something went wrong on the way _to_ disk? > > Ok, what is wrong, then. here's the first 64 bytes of the buffer, > it contains: > > typedef struct xfs_attr_leafblock { > xfs_attr_leaf_hdr_t hdr; /* constant-structure header block */ > > where > > typedef struct xfs_attr_leaf_hdr { /* constant-structure header block */ > xfs_da_blkinfo_t info; /* block type, links, etc. */ > __be16 count; /* count of active leaf_entry's */ > __be16 usedbytes; /* num bytes of names/values stored */ > __be16 firstused; /* first used byte in name area */ > __u8 holes; /* != 0 if blk needs compaction */ > __u8 pad1; > xfs_attr_leaf_map_t freemap[XFS_ATTR_LEAF_MAPSIZE]; > /* N largest free regions */ > } xfs_attr_leaf_hdr_t; > > and > > typedef struct xfs_da_blkinfo { > __be32 forw; /* previous block in list */ > __be32 back; /* following block in list */ > __be16 magic; /* validity check on block */ > __be16 pad; /* unused */ > } xfs_da_blkinfo_t; > > so: > > 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 > | forw | back |magic| pad |count|used| > 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > and the only thing the verifier checks on non-crc is > magic (which is good), and count (which is what tripped here) > > if (xfs_sb_version_hascrc(&mp->m_sb)) { > <snip> > } else { > if (ichdr.magic != XFS_ATTR_LEAF_MAGIC) > return false; > } > if (ichdr.count == 0) > return false; > > so this failed to verify because count was 0. > >> And the output of the xfs_repair steps (also attached if needed): >> http://pastie.org/private/gvq3aiisudfhy69ezagw > > Ok, no on-disk corruption, that's good. > > Can you please provide as much info as possible about your system > and setup? > > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > -Eric > >> Hope this can provide some insights. >> >> -Shri > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs