Re: XFS and nobarriers on Intel SSD

Jan Schermer <jan@xxxxxxxxxxx> · Mon, 14 Sep 2015 11:23:23 +0200

I looked into this just last week.

Everybody seems to think it's safe to disable barriers if you have a non-volatile cache on the block device (be it controller, drive or SAN array), all documentation for major databases and distributions indicate you can disable them safely in this case.

Someone would have to dig throught the source code, but the only difference with barriers disable should be lack of "flush" command sent to the drive.
However, if the "flush"is one level up, the requests could be in fact reordered.

Let's just hope someone didn't screw up...

Jan

> On 14 Sep 2015, at 11:15, Nick Fisk <nick@xxxxxxxxxx> wrote:
> 
> 
> 
> 
> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Christian Balzer
>> Sent: 14 September 2015 09:43
>> To: ceph-users@xxxxxxxx
>> Subject: Re:  XFS and nobarriers on Intel SSD
>> 
>> 
>> Hello,
>> 
>> Firstly thanks to Richard on getting back to us about this.
>> 
>> On Mon, 14 Sep 2015 09:31:01 +0100 Nick Fisk wrote:
>> 
>>> Are we sure nobarriers is safe? From what I understand barriers are
>>> there to ensure correct ordering of writes, not just to make sure data
>>> is flushed down to a non-volatile medium. Although the Intel SSD’s
>>> have power loss protection, is there not a risk that the Linux
>>> scheduler might be writing data out of order to the SSD’s, meaning
>>> that in the case of power loss, essential FS data might be lost in the OS
>> buffers?
>>> 
>> The way I understand it barriers ensure order and thus consistency in face of
>> non-volatile caches.
>> So DC Intel SSDs are on the same page as BBU backed cached RAID
>> controllers with HW cache (and the HDD caches turned OFF!).
>> That is, completely safe with no-barriers.
>> 
>> To quote from the mount man page:
>> ---
>> This enables/disables barriers.  barrier=0 disables it, barrier=1  enables  it.
>> Write  barriers  enforce proper on-disk ordering of journal commits, making
>> volatile disk write caches safe to use, at some performance penalty.  The
>> ext3 filesystem does not enable write barriers by default.  Be sure to enable
>> barriers unless your disks are battery-backed one way or another.  Otherwise
>> you risk filesystem corruption in case of power failure.
>> ---
>> 
>> Unflushed (dirty) data in the page cache is _always_ lost when the power
>> fails.
> 
> But that was my point, barriers should make sure that the data left in page cache is not in an order that would cause corruption. Ie data written but journal hasn't.
> 
> This guy seems to think the same
> 
> http://symcbean.blogspot.co.uk/2014/03/warning-bbwc-may-be-bad-for-your-health.html
> 
> But then a Redhat bug was closed suggesting that FS journal operations are always done as NOOP regardless of the scheduler.....so I guess that means it's safe???
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1104380
> 
> 
>> 
>> That said, having to disable barriers to make Avago/LSI happy is not
>> something that gives me the warm fuzzies.
>> 
>> Christian
>>> 
>>> 
>>> Maybe running with the NOOP scheduler and nobarriers maybe safe, but
>>> unless someone with more knowledge on the subject can confirm, I would
>>> be wary about using nobarriers with CFQ or Deadline.
>>> 
>>> 
>>> 
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>>> Of Richard Bade Sent: 14 September 2015 01:31
>>> Cc: ceph-users@xxxxxxxx
>>> Subject: Re:  XFS and nobarriers on Intel SSD
>>> 
>>> 
>>> 
>>> Hi Everyone,
>>> 
>>> I updated the firmware on 3 S3710 drives (one host) last Tuesday and
>>> have not seen any ATA resets or Task Aborts on that host in the 5 days
>>> since.
>>> 
>>> I also set nobarriers on another host on Wednesday and have only seen
>>> one Task Abort, and that was on an S3710.
>>> 
>>> I have seen 18 ATA resets or Task Aborts on the two hosts that I made
>>> no changes on.
>>> 
>>> It looks like this firmware has fixed my issues, but it looks like
>>> nobarriers also improves the situation significantly. Which seems to
>>> Correlate with your experience Christian.
>>> 
>>> Thanks everyone for the info in this thread, I plan to update the
>>> firmware on the remainder of the S3710 drives this week and also set
>>> nobarriers.
>>> 
>>> Regards,
>>> 
>>> Richard
>>> 
>>> 
>>> 
>>> On 8 September 2015 at 14:27, Richard Bade <hitrich@xxxxxxxxx
>>> <mailto:hitrich@xxxxxxxxx> > wrote:
>>> 
>>> Hi Christian,
>>> 
>>> 
>>> 
>>> On 8 September 2015 at 14:02, Christian Balzer <chibi@xxxxxxx
>>> <mailto:chibi@xxxxxxx> > wrote:
>>> 
>>> Indeed. But first a word about the setup where I'm seeing this.
>>> These are 2 mailbox server clusters (2 nodes each), replicating via
>>> DRBD over Infiniband (IPoIB at this time), LSI 3008 controller. One
>>> cluster with the Samsung DC SSDs, one with the Intel S3610.
>>> 2 of these chassis to be precise:
>>> https://www.supermicro.com/products/system/2U/2028/SYS-2028TP-
>> DC0FR.cf
>>> m
>>> 
>>> 
>>> 
>>> We are using the same box, but DC0R (no infiniband) so I guess not
>>> surprising we're seeing the same thing happening.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Of course latest firmware and I tried this with any kernel from Debian
>>> 3.16 to stock 4.1.6.
>>> 
>>> With nobarrier I managed to trigger the error only once yesterday on
>>> the DRBD replication target, not the machine that actual has the FS
>> mounted.
>>> Usually I'd be able to trigger quite a bit more often during those tests.
>>> 
>>> So this morning I updated the firmware of all S3610s on one node and
>>> removed the nobarrier flag. It took a lot of punishment, but
>>> eventually this happened:
>>> ---
>>> Sep  8 10:43:47 mbx09 kernel: [ 1743.358329] sd 0:0:1:0: attempting
>>> task abort! scmd(ffff880fdc85b680) Sep  8 10:43:47 mbx09 kernel:
>>> [ 1743.358339] sd 0:0:1:0: [sdb] CDB: Write(10) 2a 00 0e 9a fb b8 00
>>> 00
>>> 08 00 Sep  8 10:43:47 mbx09 kernel: [ 1743.358345] scsi target0:0:1:
>>> handle(0x000a), sas_address(0x4433221101000000), phy(1) Sep  8
>>> 10:43:47
>>> mbx09 kernel: [ 1743.358348] scsi target0:0:1:
>>> enclosure_logical_id(0x5003048019e98d00), slot(1) Sep  8 10:43:47
>>> mbx09
>>> kernel: [ 1743.387951] sd 0:0:1:0: task abort: SUCCESS
>>> scmd(ffff880fdc85b680) --- Note that on the un-patched node (DRBD
>>> replication target) I managed to trigger this bug 3 times in the same
>>> period.
>>> 
>>> So unless Intel has something to say (and given that this happens with
>>> Samsungs as well), I'd still look beady eyed at LSI/Avago...
>>> 
>>> 
>>> 
>>> Yes, I think there may be more than one issue here. The reduction in
>>> occurrences seems to prove there is an issue fixed by the Intel
>>> firmware, but something is still happening.
>>> 
>>> Once I have updated the firmware on the drives on one of our hosts
>>> tonight, hopefully I can get some more statistics and pinpoint if
>>> there is another issue specifically with the LSI3008.
>>> 
>>> I'd be interested to know if the combination of nobarriers and the
>>> updated firmware fixes the issue.
>>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Richard
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Christian Balzer        Network/Systems Engineer
>> chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
>> http://www.gol.com/
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com