[Bug 217522] xfs_attr3_leaf_add_work produces a warning

bugzilla-daemon@xxxxxxxxxx · Mon, 05 Jun 2023 02:35:56 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=217522

--- Comment #4 from Vladimir Lomov (lomov.vl@xxxxxxxx) ---
Hello.
** bugzilla-daemon@xxxxxxxxxx <bugzilla-daemon@xxxxxxxxxx> [2023-06-04 18:32:00
+0000]:

> https://bugzilla.kernel.org/show_bug.cgi?id=217522

>>> Yes, this bug is a collision between the bad old ways of doing flex
>>> arrays:
>>>
>>> typedef struct xfs_attr_leaf_name_local {
>>>         __be16  valuelen;               /* number of bytes in value */
>>>         __u8    namelen;                /* length of name bytes */
>>>         __u8    nameval[1];             /* name/value bytes */
>>> } xfs_attr_leaf_name_local_t;

>>> And the static checking that gcc/llvm purport to be able to do properly.

>> Something similar has caused problems with kernel compilation before:
>> https://lkml.org/lkml/2023/5/24/576 (I'm not 100% sure if the origin is the
>> same though).

> Yup.

Ok, I see. The "proper" way to get rid of the warning requires too much
effort, so there are doubts as to whether it is worth it.

>>> This is encoded into the ondisk structures, which means that someone
>>> needs to do perform a deep audit to change each array[1] into an
>>> array[] and then ensure that every sizeof() performed on those structure
>>> definitions has been adjusted.  Then they would need to run the full QA
>>> test suite to ensure that no regressions have been introduced.  Then
>>> someone will need to track down any code using
>>> /usr/include/xfs/xfs_da_format.h to let them know about the silent
>>> compiler bomb heading their way.

>>> I prefer we leave it as-is since this code has been running for years
>>> with no problems.

>> Should I assume that this problem is not significant and won't have any
>> effect
>> to the FS and won't cause the FS to misbehave or become corrupted? If so,
>> why
>> does the problem only show up on one host but not on the other? Or is this a
>> runtime check, and it somehow happens on the first system (even rebooted
>> twice), but not on the second one.

> AFAICT, there's no real memory corruption problem here; it's just that
> the compiler treats array[1] as a single-element array instead of
> turning on whatever magic enables it to handle flexarrays (aka array[]
> or array[0]).  I don't know why you'd ever want a real single-element
> array, but legacy C is fun like that. :/

Ok, I get it, but what bothers me is why I only see this message on one
system and not the other.

At first I thought it had to do with the fact that I explicitly set
"read-only" attribute (chattr +i) to one file (/etc/resolv.conf), but I
checked that both systems had the same settings on that file. Then I thought
it might be a problem with XFS, but I configured to run fsck on every boot, so
that the problem would be revealed at boot time, and I wouldn't see it again
after the next reboot. But the message remains even after reboot. So I must
conclude that the warning has nothing to do with the FS and the problem lies
somewhere else.

I'm puzzled why I don't see this message on the second system, especially
since I didn't see it with kernel 5.15 and the previous linux-next (I have a
different problem with these systems, so I don't run kernels 6.0+, but I'm
running linux-next to see if the problem persists). Let me stress what worries
me: why am I seeing this message on one system and not on the other? Why I
didn't see this message on the previous linux-next (compiled with the same
compiler)?

It might be related to the disks used (HDD, SSD SATA and NVME), because on the
system in question systemd gives a warning like 'invalid GPT table' (or
something like that, not the exact wording), even when I have repartitioned
the disk.

[...]

---
Vladimir Lomov

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.