Re: [PATCH 1/3] libata: avoid waking disk for several commands

Damien Le Moal <dlemoal@xxxxxxxxxx> · Thu, 25 Jan 2024 06:51:39 +0900

On 1/25/24 01:04, Phillip Susi wrote:
> Damien Le Moal <dlemoal@xxxxxxxxxx> writes:
> 
>> Flush issuing is a lot more complicated than just blkdev_issue_flush(). There is
>> a whole file dedicated to handling flushes (block/blk-flush.c).
>>
>> But that is beside the point, which is that trying to not execute flush is
>> simply completely wrong. Please stop trying.
> 
> I tried before to have libata ignore the useless flush and you said to
> stop the flush from happening in the first place.  Now you say that's wrong?

No, not wrong. But you are looking at the wrong layer. Do not try to prevent
flushes from being issued at the block layer level but *above* it. That is,
FSes, applications etc.

>> For your case, which is a drive put to sleep with hdparm -Y, only libata is
>> aware that the drive is sleeping. That first needs to change if you want the
>> kernel overall to be able to do anything. As I proposed already, using runtime
>> PM with sleep mode instead of standby would be a good start.
> 
> No, I'm working on runtime pm now, as you suggested.  If you try using
> runtime pm with disks, you quickly see that it does not work.

What does not work ? Everything is fine with my testing: the drive is always
woken up whenever someone (FS, applications etc) issue a block IO (including
flush) to the block layer. That is the expected behavior. If you want to have
the disk keep sleeping, the device users must stop poking at the drive.

> 
>> Regarding the flushes and other commands you see that are waking up the drive
>> for no *apparent* good reasons, please identify what application/component is
>> issuing these commands and look there to see why the commands are being issued
>> and if that is done with awareness for the device power state.
> 
> Filesystems flush every few seconds.  So does anyone calling sync(),
> which the kernel does when you suspend to ram.

Filesystems issue flush only if they have dirty data to flush, normally. I have
not looked at ext4/xfs code in a while, but if the FS has not committed any
transaction in the last cycle, there should be no flush issued. If there is,
then someone is reading or writing files (for reading, if you have atime
enabled, that will cause a metadata change and so a transaction commit).

> Either the filesystems need to keep track of whether a flush is needed
> and skip it, or if they all call the same place ( blkdev_issue_flush ),
> then it only needs done once in that place.

See above. That is why I am telling you to run blktrace or any tracer to figure
out the command sequence and who is doing what. There is absolutely no clarity
to what you are describing and so we cannot plan for a solution.

> The core logic needs to be "if nothing has been written, then nothing
> needs to be flushed".  Right now filesystems just flush periodically or
> when their sync method is called to make sure that anything that has
> been written is stable.

I do not think that the periodic flush happens if nothing has been written. For
"sync", someone issues that, not the FS on its own. So trace it to find out who
is doing that.

> In userspace you have smartd and udisks2 to worry about.  The
> information they need to know is whether the disk has otherwise been in
> recent use and so they should not poll the SMART data.  I don't see
> anything exported in the power/ directory that would give an indication
> of the current *remaining* time until autosuspend, or how long it has
> been since the last access.  Either one would be useful for userspace to
> decide that nobody else seems to be using the disk, so I'll leave it
> alone for now so it can go to sleep.

-- 
Damien Le Moal
Western Digital Research