Re: v0.80.4 Firefly released

Samuel Just <sam.just@xxxxxxxxxxx> · Wed, 16 Jul 2014 16:41:32 -0700

[Apologies for the repost, attachment was too big]

Sorry for the delay.  I've been trying to put together a simpler
reproducer since no one wants to debug a filesystem based on rbd
symptoms :).  It doesn't appear to be related to using extsize on a
non-empty file. The linked archive below has a reproducer
(xfs_extsize_reproducer.cc), an input op sequence (trimmed-ops.in),
the resulting file and what it should be (test, test.correct), and a
summary (notes.txt).

http://filedump.ceph.com/samuelj/reproducer.tgz

I think this probably is fixed in the commit mentioned above (xfs: Use
preallocation for inodes with extsz hints).

Thanks!
-Sam

On Wed, Jul 16, 2014 at 3:31 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, Jul 16, 2014 at 10:26:23AM -0700, Gregory Farnum wrote:
>> On Wed, Jul 16, 2014 at 2:22 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>> > On Tue, Jul 15, 2014 at 04:45:59PM -0700, Sage Weil wrote:
>> >> This Firefly point release fixes an potential data corruption problem
>> >> when ceph-osd daemons run on top of XFS and service Firefly librbd
>> >> clients.  A recently added allocation hint that RBD utilizes triggers
>> >> an XFS bug on some kernels (Linux 3.2, and likely others) that leads
>> >> to data corruption and deep-scrub errors (and inconsistent PGs).  This
>> >> release avoids the situation by disabling the allocation hint until we
>> >> can validate which kernels are affected and/or are known to be safe to
>> >> use the hint on.
>> >
>> > I've not really seen an report for that on the XFS list, could it be
>> > that you're running into the issue fixed by
>> >
>> >  "xfs: Use preallocation for inodes with extsz hints"
>> >
>> > (commit aff3a9edb7080f69f07fe76a8bd089b3dfa4cb5d)?
>>
>> Sam reported the issue we're seeing in "consequences of
>> XFS_IOC_FSSETXATTR on non-empty file?",
>
> Assuming you've created an extent size hint with a file with delayed
> allocation on it and no blocks, then that's more than likely the
> same issue. The above commit uses preallocation to allocate
> unwritten extents rather than delayed allocation for files with
> extent size hints because delayed allocation doesn't write zeros
> over ranges in the allocated extents that don't have dirty data over
> them.
>
> Moral of the story: any time you get what appears to be data
> corruption in the underlying data store, you should report it to the
> relevant filesystem list rather than try to work around it....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs