The bad interaction caused by the bug goes like this.
The driver tries to get the battery information and finds out that the HID++
2.0 device has the Unified Battery feature (0x1004) at index 0x08. It sends a
command to the device to report the capabilities its Unified Battery feature
supports using
#define CMD_UNIFIED_BATTERY_GET_CAPABILITIES 0x00
for the command and software identifier nibbles in byte 4 of the message,
which ends up being
11 FF 0800 00000000000000000000000000000000
This is a non-conforming message because the software identifier nibble (the
low-order nibble of byte 4) is 0x0.
The device does not respond with an error indicating that the message is
non-conforming but instead treats the message as conformant and responds with
the capabilities of its Unified Battery feature
11 FF 0800 0F030000000000000000000000000000
copying the device number, feature index, function, and software identifier,
as required. The rest of the response indicates that the feature on the
device supports all four qualitative battery levels, is rechargeable, and also
reports battery levels in percentages.
Devices with this feature also emit notification messages when their batteries
change level. Notification messages are distinguished from response messages
by having a 0x0 software identification nibble. So this response is read by
other software that is listening to the raw HID messages from the device and
interpreted as an event notification.
In this case the feature has an event notification with function 0x0 that is
emitted when status of the device's battery is changed. The fifth byte of
these notification messages is the percentage of battery energy available. So
the software concludes that the device's battery has 15% of its energy remaining.
If the driver is changed to
#define CMD_UNIFIED_BATTERY_GET_CAPABILITIES 0x01
then the command sent would be
11 FF 0801 00000000000000000000000000000000
which is conforming, and the response would be
11 FF 0801 0F030000000000000000000000000000
which is not the same as any event notification message.
Other bad interactions are also possible, including the the driver
misinterpreting an event notification as a response to a message that it sent
to the device.
peter
On 8/26/22 10:35, Bastien Nocera wrote:
On Thu, 2022-08-25 at 16:47 -0400, Peter F. Patel-Schneider wrote:
This patch will cause more use of a part of the driver that
constructs
messages that do not conform to the HID++ 2.0 specification. This
makes now a good time to fix the parts of the driver that construct
non-conforming messages. More information follows.
This would cause problems, but not any worse than adding the product
IDs individually, which is what we're trying to avoid.
This results in non-conforming messages being sent to devices. As
devices are obligated to return this nibble intact they produce non-
conforming responses as well. (Their other option would be to reject
the messages.) This confuses other software that correctly uses this
nibble to distinguish between device response messages and device
event
messages.
I don't understand how this...
In particular, the response to the unified battery command to get the
capabilities comes back with a 0x00 function byte which is
indistinguishable from a spontaneous notification message from the
device for a battery status event. Other software trying to
communicate with the device (e.g., Solaar) sees a unified battery
status notification and will generally end up with incorrect
information about the device. I suspect that this is actually
happening and is the cause of the Solaar bug report
https://github.com/pwr-Solaar/Solaar/issues/1718
...could cause this. Can you explain what the messages would look like
in both cases, and how they could be misinterpreted as a 15 vs. 85
percent battery level?
There is also the possibility that the driver confuses a notification
from the device as the response to a command that it sent. When this
happens it would be likely that the actual response would be treated
as
a notification.
The fix is to modify all the CMD definitions in the code to have 1 in
their low-order nibble.
All in all, I don't see those bugs as blocking the integration of the
patch discussed above, I see it as a way to expose those bugs and
possibly a way to make them more urgent.
Filipe, were those problems known/already reported?
Cheers
I reported https://bugzilla.kernel.org/show_bug.cgi?id=215699 which is a
different problem with the same cause, albeit in a different constant in the
driver.
peter