Search Linux Wireless

Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 4/14/21 11:52 AM, Benjamin Berg wrote:
> Hi,
> 
> On Wed, 2021-04-14 at 10:17 +0200, Hans de Goede wrote:
>> Hi,
>>
>> Adding Benjamin Berg who is one of the gnome-settings-daemon
>> maintainers to the Cc.
>>
>> On 4/14/21 9:07 AM, Johannes Berg wrote:
>>> On Wed, 2021-04-14 at 05:12 +0000, Grumbach, Emmanuel wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've been debugging a userspace rfkill issue today which boils
>>>>> down
>>>>> to the
>>>>> "rfkill: add a reason to the HW rfkill state" patch:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
>>>>> d=14486c82612a177cb910980c70ba900827ca0894
>>>>> breaking userspace.
>>>>
>>>> This has been rolled back by:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6
>>>> Other userspace broke (systemd) so Johannes rolled this back by
>>>> default.
>>
>> I see, but this change is not in:
>>
>> kernel-headers-5.11.11-300.fc34.x86_64
>>
>> Meaning that basically all of Fedora 34 has been built with the "bad"
>> headers.
>>
>>>> Userspace that is interested in the new byte will read 9 bytes.
>>>
>>> Which, unfortunately, doesn't address *this* particular case, because
>>> it
>>> uses gio and that will fill the buffer with arbitrary size?
>>>
>>> When you (Hans) say you saw in strace a read of size 8, did you mean
>>> the
>>> size passed to it, or the return size? I guess it must be the return
>>> size, and the size passed to it was way larger.
>>
>> No for some reason, for some later read calls gio was actually passing
>> 8 as size to the read() syscall. And g-s-d was compiled with headers
>> where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
>> would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
>> out of the 16 bytes it got to fill a single rfkill_event which is fine,
>> except that it uses the remaining 7 bytes as the first 7 bytes of the
>> next rfkill_event which it processes making that next event be
>> completely bogus.
>>
>> I do believe this really is a g-s-d bug though, it should not have
>> been using a "buffered" gio-channel on a /dev/foo node; and so far it
>> only got away with this by the rfkill_event size being a nice power of
>> 2 value.
>>
>> As I mentioned in an email to Benjamin, g-s-d should really switch
>> to making direct read() calls on the fd circumventing the gio-channel
>> read code all together:
>>
>> "Right, notice I just realized that even after my fix there still is an
>> issue, when running code compiled against new kernel headers gsd-rfkill
>> will now always expect 9 bytes per event. But when running on an older
>> kernel that will not be true.
> 
> This confuses me. i.e. if g-s-d is compiled against headers where the
> struct is 9 bytes long, then we should just be getting a short read of
> 8 bytes. And that should be the same, whether we use plain read() or
> g_io_channel_read_chars().
> 
> I was under the impression that with
>   https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/232
> merged, the read should be fine on the GNOME side.
> 
> That said, the g-s-d event handler is checking the size of the read
> against the struct size. This is obviously wrong, it should only check
> that the read was successful (or check for >= V1 size).

Right this is what I was referring to a g-s-d compiled against the
new headers with a struct size of 9 will read 8 bytes on an old kernel
and that will fail the len check, so it won't work.

But this will actually never happen as I just noticed that g-s-d
uses a private rfkill.h copy with the old 8 bytes struct definition.

So with the buffered-io disabled everything should work fine,
see my other email in this thread.

We should probably still fix / clean the code a bit though, as
you are working on in:
https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/234

Regards,

Hans




> 
> Benjamin
> 
>> To fix this the code should probably just directly call read() itself
>> on the fd (only using g_io_channel for the polling) and then accept
>> anything
>> between 8 bytes and sizeof(struct rfkill_event) as valid return value
>> from the
>> read() call..."
>>
>> Notice that the problem which I described there will go away when
>> compiled
>> against even newer kernel-headers where sizeof(struct rfkill_event) is
>> back to 8 again.
>>
>> So question, for code like g-s-d, which does not care about the new
>> hard_block_reasons field. What would be the preferred / max
>> compatibility
>> way to do the reads. Also keeping in mind that there are "bad" kernel
>> headers out there where sizeof(struct rfkill_event) == 9 ?
>>
>> I think that this would be best:
>>
>>         ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
>>         if (ret == RFKILL_EVENT_SIZE_V1) {
>>                 /* Valid event process it */
>>         }
>>
>> This should produce the same code regardless of the kernel-headers
>> version
>> and should work on both old and new kernels, correct ?
>>
>>> The commit Emmanuel linked to fixes cases such as systemd that were
>>> just
>>> completely garbage (reading with one size, and then checking they got
>>> another), but it wouldn't fix this case.
>>>
>>> Unfortunately, as you also said, it does seem a bit late now - it's
>>> been
>>> released in various kernels since 5.10, and while the default
>>> rollback
>>> will improve the situation somewhat, read(..., size>8) will still
>>> return
>>> 9 bytes rather than 8 as it used to. Switching that *also* back
>>> *should*
>>> be safe, but who knows what other bugs were introduced in the
>>> meantime?
>>>
>>> I certainly don't really have a major objection to rolling that also
>>> back, but would it really help that much at this point? I guess it
>>> could
>>> be going into 5.10/5.11 stable kernels though.
>>
>> I don't think that rolling back the new extended-event support
>> altogether
>> will help. Since this has been out there for 2 released kernel versions
>> now, I believe the best way to fix this is to fix userspace; and to fix
>> userspace in such a way (at least g-s-d) that this problem cannot
>> happen again.
>>
>> Regards,
>>
>> Hans
>>
> 




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux