Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/23/12 11:15 PM, Nix wrote:
> On 24 Oct 2012, Eric Sandeen uttered the following:
> 
>> On 10/23/12 3:57 PM, Nix wrote:
>>> The only unusual thing about the filesystems on this machine are that
>>> they have hardware RAID-5 (using the Areca driver), so I'm mounting with
>>> 'nobarrier': 
>>
>> I should have read more.  :(  More questions follow:
>>
>> * Does the Areca have a battery backed write cache?
> 
> Yes (though I'm not powering off, just rebooting). Battery at 100% and
> happy, though the lack of power-off means it's not actually getting
> used, since the cache is obviously mains-backed as well.
> 
>> * Are you crashing or rebooting cleanly?
> 
> Rebooting cleanly, everything umounted happily including /home and /var.
> 
>> * Do you see log recovery messages in the logs for this filesystem?
> 
> My memory says yes, but nothing seems to be logged when this happens
> (though with my logs on the first filesystem damaged by this, this is
> rather hard to tell, they're all quite full of NULs by now).
> 
> I'll double-reboot tomorrow via the faulty kernel and check, unless I
> get asked not to in the interim. (And then double-reboot again to fsck
> everything...)
> 
>>> the full set of options for all my ext4 filesystems are:
>>>
>>> rw,nosuid,nodev,relatime,journal_checksum,journal_async_commit,nobarrier,quota,
>>> usrquota,grpquota,commit=30,stripe=16,data=ordered,usrquota,grpquota
>>
>> ok journal_async_commit is off the reservation a bit; that's really not
>> tested, and Jan had serious reservations about its safety.
> 
> OK, well, I've been 'testing' it for years :) No problems until now. (If
> anything, I was more concerned about journal_checksum. I thought that
> had actually been implicated in corruption before now...)

It had, but I fixed it AFAIK; OTOH, we turned it off by default
after that episode.

>> * Can you reproduce this w/o journal_async_commit?
> 
> I can try!

Ok, fair enough.  If the BBU is working, nobarrier is ok; I don't trust
journal_async_commit, but that doesn't mean this isn't a regression.

Thanks for the answers... onward.  :)

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux