Hi Martin,
On 12/10/19 8:08 PM, Martin K. Petersen wrote:
Hi Guenter,
The most recent attempt was [1] by Linus Walleij. It went through a total
of seven iterations. At the end, it was rejected for a number of reasons;
see the provided link for details. This implementation resides in the
SCSI core. It originally resided in libata but was moved to SCSI per
maintainer request, where it was ultimately rejected.
While I am sure I come across as a curmudgeon, regressions is a major
concern for me. That, and making sure we pick the right architecture. I
thought we were making good progress in that department when Linus
abandoned the effort.
If anything, I am surprised that he did not give up earlier. Personally
I did not see a path to success after v7 of the patch set was rejected.
I also no longer believe that temperature monitoring of SATA drives
should be implemented within the ATA or SCSI subsystem. I came to the
conclusion that it is much better suited as separate hardware monitoring
driver. As separate driver, its instantiation is in full control of
the user. If it causes trouble (or, as mentioned separately, if it adds
too much instantiation time, or if it is considered to be too large),
it can simply be disabled in a given system by blacklisting it (or,
rather, by not explicitly loading it in the first place). With that,
there is no real compatibility concern. If and when drives are detected
which report bad information, such drives can be added to a blacklist
without impact on the core SCSI or ATA code. Until that happens, not
loading the driver solves the problem on any affected system.
The feedback on this approach suggests to use the SCSI Temperature log
page [0x0d] as means to access drive temperature information. It is
unknown if this is implemented in any real SCSI drive.
Almost every SCSI drive has it.
Good to hear.
The feedback also suggests to obtain temperature from ATA drives,
convert it into the SCSI temperature log page in libata-scsi, and to
use that information in a hardware monitoring driver. The format and
method to do this is documented in [3]. This is not currently
implemented in the Linux kernel.
Correct, but I have no qualms over exporting the SCSI temperature log
page. The devices that export that page are generally well-behaved.
Also good to hear. However, for my part, I have no means to test such
code since I don't have any SCSI drives.
My concerns are wrt. identifying whether SMART data is available for
USB/UAS. I am not too worried about ATA and "real" SCSI (ignoring RAID
controllers that hide the real drives in various ways).
The one USB/UAS connected SATA drive I have (a WD passport) reports
itself as "WD ", not as "ATA ". I would expect other drives
to do the same. That drive reports (via smartctl) that it supports
both SCT and SMART data. It doesn't report temperatures through SCT,
but it does report the drive temperature with SMART attribute 194.
I did not attempt to add support for this and similar drives since
I don't know if I can reliably detect it. The potential benefit
compared to the risk seemed too low (we would be getting into
possible regression space) for me to try. Such code (effectively
it boils down to relaxing SATA drive detection) can always be added
at a later time.
I am not sure why the SCSI temperature log page parsing would be
complex. I will have to go check smartmontools to see what that is all
Not as much the parsing, but detection if the information is there.
about. The spec is as simple as can be.
Possibly. I personally also find it quite vague. It is definitely not
something I would want to try to implement without ability to see how
the data actually looks like as reported by a real drive, and without
ability to test the code.
Anyway. I think the overall approach wrt. SCT and falling back to
well-known SMART fields is reasonably sane and fine for libata. But I
don't understand the pushback wrt. using the SCSI temperature log page
as a conduit. I think it would be fine if this worked out of the box for
This is not a pushback per se. It is simply a matter of ability (or lack
of it) to test any such code.
Regarding "conduit", I assume you mean converting SATA/SCT information
into SCST temperature pages and reporting temperature purely based
on those. I personally think that this would be the wrong approach:
It would effectively require code in the ATA core which is not really
needed there. This would bloat the ATA code with no real advantage.
In my opinion, available temperature information should be interpreted
where it is needed, and only there, not in several places. I see that
as much less risky and error prone than spreading the code to multiple
places.
both SCSI and ATA drives.
The elegance of my approach is that adding support for reading temperatures
from SCSI drives (or, for that matter, USB/UAS drives) would be
straightforward. All one would need to do is to implement the necessary
detection code as well as a function to actually read the information
from the drive. This can be done at any time, and, again, it should be
done by someone with the ability to test the code.
The elephant in the room remains USB. And coming up with a way we can
reliably detect whether it is safe to start poking at the device to
discover if SMART is provided. If we eventually want to pursue USB, > think your heuristic stuff needs to be a library that can be leveraged
by both libata and USB. But that doesn't have to be part of the initial
effort.
And finally, my concerns wrt. reacting to bad sensors remain. Not too
familiar with hwmon, but I would still like any actions based on
reported temperatures to be under user control and not the kernel.
All sensors can report bad information, and quite often they do.
This is actually quite normal in any given system. That doesn't mean
that the available (connected) sensors should be ignored.
Also, when it comes to actions, the one subsystem performing any actions
in the kernel based on temperature sensor information is the thermal
subsystem, and that is on purpose implemented in the kernel.
The hardware monitoring subsystem, on its own, is purely passive
and only reports sensor information; it does not act on it. Any action
will either be done by userspace (eg with fancontrol) or by the thermal
subsystem.
Overall, I understand the desire to also support temperature reporting
for SCSI and USB/UAS drives. As hardware monitoring maintainer, I'd
be happy to accept patches implementing that support. However, I don't
see this as immediately necessary, and I would want to have some
reassurance that such code is well tested and doesn't cause any
regressions (especially since concerns about possible regressions were
mentioned several times in the context of the previous submissions).
Thanks,
Guenter