On 5/3/24 00:23, Bjorn Helgaas wrote:
On Thu, Mar 07, 2024 at 04:36:06PM -0600, Bjorn Helgaas wrote:
[+cc Hannes, who did a lot of related VPD work and reviewed the
original posting at
https://lore.kernel.org/r/20210715215959.2014576-6-helgaas@xxxxxxxxxx/]
On Thu, Mar 07, 2024 at 05:09:27PM +0100, Josselin Mouette wrote:
When a device returns invalid VPD data, it can be misused by other
code paths in kernel space or user space, and there are instances
in which this seems to cause memory corruption.
More of the background from Josselin at
https://lore.kernel.org/r/aaea0b30c35bb73b947727e4b3ec354d6b5c399c.camel@xxxxxxxxxx
This is a regression, and obviously needs to be fixed somehow, but I'm
a bit hesitant to revert this until we understand the problem better.
If there's a memory corruption lurking and a revert hides the
corruption so we never fix it, I'm not sure that's an improvement
overall.
I don't want to drop this, but we're kind of stuck here:
- I'd still like to understand the problem better.
- Trivially, I can't apply patches lacking the appropriate
signed-off-by; see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.6#n396
Right. Again.
So: Problem is that VPD data is a self-describing stream of serialized
TLV records. The records themselves have no internal consistency check,
so any data read is assumed to be correct. If there is an error during
decoding we really should not continue, as everything beyond that error
cannot be relied on.
However, the assumption was that the read data is correct, so in getting
an error the very assumption leading us to interpret the data as VPD
records is invalid, too.
But question remains: was the data correct until the error?
Or was the error earlier, and we just mis-interpreted invalid data?
In my original patch I used the second attempt, as this will pretty
much guarantee that all VPD data will be correct. It has the drawback,
though, that for some cards we won't be able to read the VPD data, even
if they would provide valid data until the error.
As there is no good way out of this I'd rather see a blacklist of cards
which _can_ be read despite the error, and don't return VPD data otherwise.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich