Re: [PATCH v7 0/7] Improve libata support for FUA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Thu, Jan 05, 2023 at 12:43:06PM +0900, Damien Le Moal wrote:
> > These optional features tend to be broken in various and subtle ways,
> 
> FUA is not optional for any drive that supports NCQ. The FUA bit is a
> mandatory part of the FPDMA READ/WRITE commands. The optional part is
> support for the non-ncq WRITE FUA EXT command.

Optional in the sense that it isn't essential in achieving the main function
of the device, which means that most don't end up using it.

> > especially the ones which don't show clear and notable advantages and thus
> > don't get used by everybody. I'm not necessarily against enabling it by
> > default but we should have better justifications as we might unnecessarily
> > cause a bunch of painful and subtle failures which can take a while to sort
> > out.
> 
> Avoiding regressions is always my highest priority. I know that there
> are a lot of cheap ATA devices out there that have questionable ACS spec
> compliance.

A lot of historical devices too which don't get much scrutiny or testing but
can still cause significant griefs for the users.

> > * Can the advantages of using FUA be demonstrated in a realistic way? IOW,
> >   are there workloads which clearly benefit from FUA? My memory is hazy but
> >   we only really use FUA from flush sequence to turn flush, write, flush
> >   sequence into flush, FUA-write. As all the heavy lifting is done in the
> >   first flush anyway, I couldn't find a case where that optimization made a
> >   meaningful difference but I didn't look very hard.
> 
> The main users in kernel are file systems, when committing
> transactions/metadata journaling. Given that this is generally not
> generating a lot of traffic, I do not think we can measure any
> difference for HDDs. The devices are too slow to start with, so saving
> one command will not matter much, unless the application is fsync()
> crazy (and even then, not sure we'll see any difference). Even for SATA
> SSDs it likely will be hard to see a difference I think.

On a quick glance, there are some uses of REQ_FUA w/o REQ_PREFLUSH which
indicates that there can be actual gains to be had. However, ext4 AFAICS
always pairs PREFLUSH w/ FUA, so a lot of use cases won't see any gain while
taking on the possible risk of being exposed to FUA commands.

> Then we have applications using the drive block device file directly.
> For these, it is hard to tell how much it matters. Enabling it by
> default with a drive correctly supporting it will very much likely not
> hurt though.
> 
> Maciej,
> 
> May be you did some experiments before asking for enabling FUA by
> default ? Any interesting performance data you can share ?
> 
> > * Do we know how widely FUA is used now? IOW, is windows using FUA by
> >   default now? If so, do we know whether they have a blocklist?
> 
> You mean "blacklist" ? I do not have any information about Windows, but

The PC thing to say now seems to be allowlist / blocklist instead of
whiltelist / blacklist, not that I mind either way.

> I can try to find out, at least for my employer's devices. But that will
> not be very useful as I know these drives behave correctly.

So, AFAIK, windows doesn't issue FUA for SATA devices, only SAS, but I could
be wrong. It'd be really useful to find out.

> More than Windows or the kernel, I think that looking at SAS HBAs is
> more important here. SATA HDDs are the most widely used type of devices
> with these, by far. These may have a SAT translating FUA scsi writes to
> FUA NCQ FPDMA writes, resulting in FUA being extensively used. Modulo a
> blacklist that results in the same as the kernel with a
> flush/write/flush sequence. Hard to know as HBA's FW are not open. A bus
> analyzer could tell us that though, but again I can look at that only
> with the drives I have, which I know are working well with FUA.
> 
> I am OK with attempting enabling FUA by default for the following reasons:
> 1) The vast majority of drives in libata blacklist (all features) are
> old models that are not sold anymore.

The context here is that we promptly found all of these devices struggle
with FUA (like locking up and dropping off the bus) shortly after we enabled
FUA by default, so the list is by no means exhaustive and is more an
indication that there at least were a whole lot of devices which choke on
FUA. On top, devices not sold anymore are even harder to debug and pay
attention to while being able to cause a lot of pain to configurations which
have been stable and happy for a long time.

> 2) We are restricting FUA support to drives that also support NCQ, that
> is, modern-ish ones that are supposed to process the FUA NCQ read/write
> commands correctly, per specs.

NCQ is really old now and our previous attempt at FUA was after NCQ was
widely available, so I'm not sure this holds.

> 3) For HDDs, which is the vast majority of ATA devices out there these
> days, all recent drives I have tested are OK. Even older ones with NCQ
> support that I have access to are fine.
> 4) We are at rc2, which gives us time to revert patch 7 if we see too
> many bug reports.

This sort of problems especially if affecting mostly old devices can be very
difficult to suss out and will definitely take way longer than a single
release cycle.

> One thing we could add to the patch series is an additional restriction
> to enabling FUA by default to drives that support a recent standard. Say
> ACS-4 and above. That will restrict this to recent devices, thus
> reducing the risk of hitting bad apples. Thoughts ?

Yeah, that'd help and also if SAS HBA SAT's have been issuing FUA's which
would be a meaningful verification of the feature, at least for rotating
hard disks.

I feel rather uneasy about enabling FUA by default given history. We can
improve its chances by restricting it to newer devices and maybe even just
hard disks, but it kinda comes back to the root question of why. Why would
we want to do this? What are the benefits? Right now, there are a bunch of
really tricky cons and not whole lot on the pro column.

Thanks.

-- 
tejun



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux