Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hans de Goede <hdegoede@xxxxxxxxxx> · Wed, 14 Apr 2021 10:17:03 +0200

Hi,

Adding Benjamin Berg who is one of the gnome-settings-daemon
maintainers to the Cc.

On 4/14/21 9:07 AM, Johannes Berg wrote:
> On Wed, 2021-04-14 at 05:12 +0000, Grumbach, Emmanuel wrote:
>>>
>>> Hi,
>>>
>>> I've been debugging a userspace rfkill issue today which boils down
>>> to the
>>> "rfkill: add a reason to the HW rfkill state" patch:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
>>> d=14486c82612a177cb910980c70ba900827ca0894
>>> breaking userspace.
>>
>> This has been rolled back by:
>> https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6
>> Other userspace broke (systemd) so Johannes rolled this back by
>> default.

I see, but this change is not in:

kernel-headers-5.11.11-300.fc34.x86_64

Meaning that basically all of Fedora 34 has been built with the "bad"
headers.

>> Userspace that is interested in the new byte will read 9 bytes.
> 
> Which, unfortunately, doesn't address *this* particular case, because it
> uses gio and that will fill the buffer with arbitrary size?
> 
> When you (Hans) say you saw in strace a read of size 8, did you mean the
> size passed to it, or the return size? I guess it must be the return
> size, and the size passed to it was way larger.

No for some reason, for some later read calls gio was actually passing
8 as size to the read() syscall. And g-s-d was compiled with headers
where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
out of the 16 bytes it got to fill a single rfkill_event which is fine,
except that it uses the remaining 7 bytes as the first 7 bytes of the
next rfkill_event which it processes making that next event be completely
bogus.

I do believe this really is a g-s-d bug though, it should not have
been using a "buffered" gio-channel on a /dev/foo node; and so far it
only got away with this by the rfkill_event size being a nice power of 2
value.

As I mentioned in an email to Benjamin, g-s-d should really switch
to making direct read() calls on the fd circumventing the gio-channel
read code all together:

"Right, notice I just realized that even after my fix there still is an issue,
when running code compiled against new kernel headers gsd-rfkill will now
always expect 9 bytes per event. But when running on an older kernel that
will not be true.

To fix this the code should probably just directly call read() itself
on the fd (only using g_io_channel for the polling) and then accept anything
between 8 bytes and sizeof(struct rfkill_event) as valid return value from the
read() call..."

Notice that the problem which I described there will go away when compiled
against even newer kernel-headers where sizeof(struct rfkill_event) is
back to 8 again.

So question, for code like g-s-d, which does not care about the new
hard_block_reasons field. What would be the preferred / max compatibility
way to do the reads. Also keeping in mind that there are "bad" kernel
headers out there where sizeof(struct rfkill_event) == 9 ?

I think that this would be best:

	ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
	if (ret == RFKILL_EVENT_SIZE_V1) {
		/* Valid event process it */
	}

This should produce the same code regardless of the kernel-headers version
and should work on both old and new kernels, correct ?

> The commit Emmanuel linked to fixes cases such as systemd that were just
> completely garbage (reading with one size, and then checking they got
> another), but it wouldn't fix this case.
> 
> Unfortunately, as you also said, it does seem a bit late now - it's been
> released in various kernels since 5.10, and while the default rollback
> will improve the situation somewhat, read(..., size>8) will still return
> 9 bytes rather than 8 as it used to. Switching that *also* back *should*
> be safe, but who knows what other bugs were introduced in the meantime?
> 
> I certainly don't really have a major objection to rolling that also
> back, but would it really help that much at this point? I guess it could
> be going into 5.10/5.11 stable kernels though.

I don't think that rolling back the new extended-event support altogether
will help. Since this has been out there for 2 released kernel versions
now, I believe the best way to fix this is to fix userspace; and to fix
userspace in such a way (at least g-s-d) that this problem cannot
happen again.

Regards,

Hans