Hi Rafael,
On Thu, Oct 24, 2024 at 02:58:27PM GMT, Payam Moradshahi wrote:
Hi Rafael,
Thank you for your response. Please see below for my response.
Payam
On Mon, Oct 21, 2024 at 01:31:39PM GMT, Rafael J. Wysocki wrote:
> On Thu, Oct 17, 2024 at 5:59 AM Payam Moradshahi <payamm@xxxxxxxxxx>
> wrote:
> >
> > The way in which acpi_object union is being initialized varies based
on
> > compiler type, version and flags used. Some will zero-initialize the
> > entire union structure and some will only initialize the first N-bytes
> > of the union structure.
> Any details?
I dumped acpi_object after initialization and noticed either the
entire structure was zero-initialized or just the first 8 bytes
depending on compiler type and version.
gcc 13.2.0: bad
CLANG_CL=362121269: good
CLANG_CL=609443126: bad
CLANG_CL=684773960: good
For reference the latter ones are Google internal builds of clang, but
we've played around in godbolt with this in a variety of versions and
some work and some don't. Payam has the relevant section from the C
standard below.
> > This could lead to uninitialized union members.
> So this is working around a compiler bug AFAICS.
> If the compiler has this bug, is it guaranteed to compile the rest of
> the kernel correctly?
This is not a compiler bug. The way acpi_object union is being
initialized is not c-spec compliant.
I think in past versions we might've gotten lucky with this, recent
compilers might've tightened this up some or changed behavior.
Based on C Standard ISO/IEC 9899:202x (E):
Section 6.2.6.1 (7) says:
When a value is stored in a member of an object of union type, the bytes
of
the
object representation that do not correspond to that member but do
correspond
to other members take unspecified values
Section 6.7.9 says:
If an object that has automatic storage duration is not initialized
explicitly,
its value is indeterminate.
If an object that has static or thread storage duration is not initialized
explicitly, then:
- if it is a union, the first named member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
The above guarantees zero only in the case of static or thread storage,
which is not the case here.
> > This bug was confirmed by observing non-zero value for
object->processor
> > structure variables.
> Where has it been observed? What compiler version(s)? etc.
See above for some examples
This manifests like this on Google Axion arm64 builds using Arm's
prebuilt GCC 13.2.0:
[ 1.844508] acpi ACPI0007:08: Invalid PBLK length [271170112]
[ 1.850253] acpi ACPI0007:09: Invalid PBLK length [271170112]
[ 1.855992] acpi ACPI0007:0a: Invalid PBLK length [271170112]
[ 1.861730] acpi ACPI0007:0b: Invalid PBLK length [271170112]
[ 1.867470] acpi ACPI0007:0c: Invalid PBLK length [271170112]
[ 1.873208] acpi ACPI0007:0d: Invalid PBLK length [271170112]
[ 1.878947] acpi ACPI0007:0e: Invalid PBLK length [271170112]
[ 1.884686] acpi ACPI0007:0f: Invalid PBLK length [271170112]
[ 1.890424] acpi ACPI0007:10: Invalid PBLK length [271170112]
[ 1.896161] acpi ACPI0007:11: Invalid PBLK length [271170112]
[ 1.901898] acpi ACPI0007:12: Invalid PBLK length [271170112]
[ 1.907636] acpi ACPI0007:13: Invalid PBLK length [271170112]
[ 1.913374] acpi ACPI0007:14: Invalid PBLK length [271170112]
[ 1.919113] acpi ACPI0007:15: Invalid PBLK length [271170112]
[ 1.924851] acpi ACPI0007:16: Invalid PBLK length [271170112]
> > non-zero initialized members of acpi_object union structure causes
> > incorrect error reporting by the driver.
> >
> > If a BIOS is using "Device" statement as opposed to "Processor"
> > statement, object variable may contain uninitialized members causing
the
> > driver to report "Invalid PBLK length" incorrectly.
> >
> > Using memset to zero-initialize the union structure fixes this issue
and
> > also removes the dependency of this function on compiler versions and
> > flags being used.
> >
> > Tested: Tested on ARM64 hardware that was printing this error and
> > confirmed the prints were gone.
> >
> > Also confirmed this does not cause regression on ARM64 and X86
> > machines.
> >
> > Signed-off-by: Payam Moradshahi <payamm@xxxxxxxxxx>
> > ---
> > drivers/acpi/acpi_processor.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/acpi/acpi_processor.c
> b/drivers/acpi/acpi_processor.c
> > index 7cf6101cb4c73..6696ad4937d21 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -275,7 +275,7 @@ static inline int
> acpi_processor_hotadd_init(struct acpi_processor *pr,
> >
> > static int acpi_processor_get_info(struct acpi_device *device)
> > {
> > - union acpi_object object = { 0 };
> > + union acpi_object object;
> > struct acpi_buffer buffer = { sizeof(union acpi_object),
> &object };
> > struct acpi_processor *pr = acpi_driver_data(device);
> > int device_declaration = 0;
> > @@ -284,6 +284,8 @@ static int acpi_processor_get_info(struct
> acpi_device *device)
> > unsigned long long value;
> > int ret;
> >
> > + memset(&object, 0, sizeof(union acpi_object));
> > +
> > acpi_processor_errata();
> >
> > /*
> > --
Payam, it might be good to add a log snippet and references to the relevant
spec
sections to your v2 commit message.
Moreover, for v2 of the patch you probably want to add a
Cc: stable@xxxxxxxxxxxxxxx
to your patch to make sure it gets picked up once it gets picked up
for mainline. See [1] for more info on the stable process.
Cheers,
Moritz
[1] https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst