Re: XFS and nobarrier with SSDs

Martin Steigerwald <martin@xxxxxxxxxxxx> · Mon, 14 Dec 2015 09:38:56 +0100

Hello Georg.

I am adding in some Ccs of kernel devs and mailing lists, as I think this is a 
more generic question. Still of course would be nice to hear something from 
XFS developers as well.

For the broader audience the question is:

Is it safe to use XFS (or any other filesystem) on enterprise SSDs with Power 
Loss Protection (PLP), i.e. some capacitor to provide for enough electricity 
to write out all data in DRAM to flash after a power loss, with a reordering 
I/O scheduler like CFQ?

According to this comment of Jeff Moyer on a report on RedHat´s bugzilla the 
I/O scheduler cannot reorder commit block and log entry, so it would be safe, 
I think (see below).

Am Montag, 14. Dezember 2015, 06:43:48 CET schrieb Georg Schönberger:
> On 2015-12-12 13:26, Martin Steigerwald wrote:
> > Am Samstag, 12. Dezember 2015, 10:24:25 CET schrieb Georg Schönberger:
> >> We are using a lot of SSDs in our Ceph clusters with XFS. Our SSDs have
> >> Power Loss Protection via capacitors, so is it safe in all cases to run
> >> XFS
> >> with nobarrier on them? Or is there indeed a need for a specific I/O
> >> scheduler?
> > 
> > I do think that using nobarrier would be safe with those SSDs as long as
> > there is no other caching happening on the hardware side, for example
> > inside the controller that talks to the SSDs.
[…]
> We are using HBAs and no RAID controller, therefore there is no other
> cache in the I/O stack.
> 
> > I always thought barrier/nobarrier acts independently of the I/O scheduler
> > thing, but I can understand the thought from the bug report you linked to
> > below. As for I/O schedulers, with recent kernels and block multiqueue I
> > see it being set to "none".
> 
> What do you mean by "none" near? Do you think I will be more on the safe
> side with noop scheduler?

I mean that I get this on a 4.3 kernel with blk-mq enabled:

merkaba:/sys/block/sda/queue> grep . rotational scheduler 
rotational:0
scheduler:none

merkaba:/sys/block/sda/queue> echo "cfq" > scheduler 
merkaba:/sys/block/sda/queue> cat scheduler                              
none
merkaba:/sys/block/sda/queue> echo "noop" > scheduler
merkaba:/sys/block/sda/queue> cat scheduler          
none

So with blk-mq I do not get a choice which scheduler to use anyway. Which is 
what I expect. Thats why there has been a discussion about blk-mq on 
rotational devices recently.

> >> I have found a recent discussion on the Ceph mailing list, anyone from
> >> XFS
> >> that can help us?
> >> 
> >> *http://www.spinics.net/lists/ceph-users/msg22053.html
> > 
> > Also see:
> > 
> > http://xfs.org/index.php/
> > XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_
> > write_cache. 3F
> 
> I've already read that XFS wiki entry before and also found some Intel
> presentations where they suggest to use nobarrier with
> there enterprise SSDs. But a confirmation from any block layer
> specialist would be a good thing!

I think Jens Axboe would be good to ask? But as Jeff Moyer already replied to 
the bug report?

> >> *https://bugzilla.redhat.com/show_bug.cgi?id=1104380
> > 
> > Interesting. Never thought of that one.
> > 
> > So would it be safe to interrupt the flow of data towards the SSD at any
> > point if time with reordering I/O schedulers in place? And how about
> > blk-mq which has mutiple software queus?
> 
> Maybe we should ask the block layer mailing list about that?
> 
> > I like to think that they are still independent of the barrier thing and
> > the> 
> > last bug comment by Eric, where he quoted from Jeff, supports this:
> >> Eric Sandeen 2014-06-24 10:32:06 EDT
> >> 
> >> As Jeff Moyer says:
> >>> The file system will manually order dependent I/O.
> >>> What I mean by that is the file system will send down any I/O for the
> >>> transaction log, wait for that to complete, issue a barrier (which will
> >>> be a noop in the case of a battery-backed write cache), and then send
> >>> down the commit block along with another barrier.  As such, you cannot
> >>> have the I/O scheduler reorder the commit block and the log entry with
> >>> which it is associated.
> 
> If it is truly that way then I do not see any problems using nobarrier
> with the SSDs an power loss protection.
> I have already find some people say that enterprise SSDs with PLP simply
> ignore the sync call. If that's the case
> then using nobarrier would have no performance improvement...

Interesting. I think if they ignore it tough they risk data loss. Cause I 
imagine there might be a slight chance where the data has been sent by the 
controller, but not yet fully stored in SSD DRAM. Or is this operation atomic?

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html