I hit this problem again and captured the output of all the steps while repairing the filesystem. Here's the crash: http://pastie.org/private/prift1xjcc38s0jcvehvew And the output of the xfs_repair steps (also attached if needed): http://pastie.org/private/gvq3aiisudfhy69ezagw Hope this can provide some insights. -Shri On Thu, May 28, 2015 at 11:08 AM, Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx> wrote: > We'll try and reproduce this and capture the output of xfs_repair when > it happens next. Will keep an eye on what else was happening in the > infrastructure when it happens. > > FWIW, we've seen this in local VMware environment as well as when we > were running on Amazon EC2 instances. So it doesn't seem hypervisor > specific. > > On Wed, May 27, 2015 at 5:53 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: >> And did anything else "interesting" happen prior to the detection? >> >>> On May 27, 2015, at 7:52 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: >>> >>> You'll need to try to narrow down how it happened. >>> >>> The hexdumps in the logs show what data was in the buffer; in one case it was ascii, and was definitely not xfs metadata. >>> >>> Either: >>> >>> a) xfs wrote the wrong metadata - almost impossible, because we verify the data on write in the same way as we do on read >>> >>> b) xfs read the wrong block due to other metadata corruption. >>> >>> c) something corrupted the storage after it was written >>> >>> d) the storage returned the wrong data on a read request ... >>> >>> e) ??? >>> >>> Did you save the xfs_repair output? That might offer more clues. >>> >>> Unless you can reproduce it, it'll be hard to come up with a definitive root cause... can you try? >>> >>> -Eric >>> >>> >>>> On 5/27/15 7:03 PM, Shrinand Javadekar wrote: >>>> Thanks Eric, >>>> >>>> We ran xfs_repair and were able to get it back into a running state. >>>> This is fine for a test & dev but in production it won't be >>>> acceptable. What other data do we need to get to the bottom of this? >>>> >>>>> On Wed, May 27, 2015 at 4:27 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: >>>>> That's not a crash. That is xfs detecting on disk corruption which likely happened at some time prior. You should unmount and run xfs_repair, possibly with –n first if you would like to do a dry run to see what it might do. If you get fresh corruption after a full repair, then that becomes more interesting. It's possible that you have a problem with the underlying block layer or it's possible that it is an xfs bug - but I think this is not something that we have seen before. >>>>> >>>>> Eric >>>>> >>>>>> On May 27, 2015, at 6:06 PM, Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am running Openstack Swift in a VM with XFS as the underlying >>>>>> filesystem. This is generating a metadata heavy workload on XFS. >>>>>> Essentially, it is creating a new directory and a new file (256KB) in >>>>>> that directory. This file has extended attributes of size 243 bytes. >>>>>> >>>>>> I am seeing the following two crashes of the machine: >>>>>> >>>>>> http://pastie.org/pastes/10210974/text?key=xdmfvaocvawnyfmkb06zg >>>>>> >>>>>> AND >>>>>> >>>>>> http://pastie.org/pastes/10210975/text?key=rkiljsdaucrk7frprzgqq >>>>>> >>>>>> I have only seen these when running in a VM. We have run several tests >>>>>> on physical server but have never seen these problems. >>>>>> >>>>>> Are there any known issues with XFS running on VMs? >>>>>> >>>>>> Thanks in advance. >>>>>> -Shri >>>>>> >>>>>> _______________________________________________ >>>>>> xfs mailing list >>>>>> xfs@xxxxxxxxxxx >>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@xxxxxxxxxxx >>>> http://oss.sgi.com/mailman/listinfo/xfs >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@xxxxxxxxxxx >>> http://oss.sgi.com/mailman/listinfo/xfs
Attachment:
xfs_crash
Description: Binary data
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs