Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hans de Goede <hdegoede@xxxxxxxxxx> · Wed, 14 Apr 2021 12:22:29 +0200

Hi,

On 4/14/21 11:47 AM, Johannes Berg wrote:
> Hi Hans,
> 
>> Adding Benjamin Berg who is one of the gnome-settings-daemon
>> maintainers to the Cc.
> 
> :)
> As you might imagine, I've talked to him ;-)
>>
>> kernel-headers-5.11.11-300.fc34.x86_64
> 
> Right, not yet.
> 
>> Meaning that basically all of Fedora 34 has been built with the "bad"
>> headers.
> 
> Yay. But still, I think in this case it wouldn't help.
> 
>>>> Userspace that is interested in the new byte will read 9 bytes.
>>>
>>> Which, unfortunately, doesn't address *this* particular case, because it
>>> uses gio and that will fill the buffer with arbitrary size?
>>>
>>> When you (Hans) say you saw in strace a read of size 8, did you mean the
>>> size passed to it, or the return size? I guess it must be the return
>>> size, and the size passed to it was way larger.
>>
>> No for some reason, for some later read calls gio was actually passing
>> 8 as size to the read() syscall. 
>>
> 
> Wait, that's weird? The (latest) gio code just calls
> g_io_channel_fill_buffer() and that doesn't care about the size you
> passed to g_io_channel_read_chars(), in fact it isn't even passed down,
> as you might expect for stream reads?

Ok, so I took another look at the strace and you are right.

There are 2 things happening here:

1. g-s-d has a private rfkill.h userspace-api copy with the old 8 byte
   rfkill_event definition, which I missed.

2. g-s-d mixes read() and g_io_channel_read_chars() usage, using
   read() to get the initial state and then later on using
   g_io_channel_read_chars() on events. The initial state read()s
   are the read(fd, buf, 8) calls which I saw in the strace.

So g-s-d was using / assuming a 8 byte struct size all the time and
my off-by-one by using buffered io theory was right, but works
differently then I expected. As you (Johannes) said, the problem is
g-s-d making read(fd, buf, 1024) calls, reading 9 bytes per event
and then returning 8 to the g_io_channel_read_chars() saving the
left over byte and pre-pending it to the next event, ruining
the next event(s).

Actually the code for getting the initial state seems to do the same
sizeof / RFKILL_EVENT_SIZE_V1 mix/match as systemd is doing
(probably copy and pasted from one to the other), but it is "saved"
somewhat by having its own rfkill.h copy:

                len = read(fd, &event, sizeof(event));
                if (len < 0) {
                        if (errno == EAGAIN)
                                break;
                        g_debug ("Reading of RFKILL events failed");
                        break;
                }

                if (len != RFKILL_EVENT_SIZE_V1) {
                        g_warning ("Wrong size of RFKILL event\n");
                        continue;
                }

And then later on, when poll says there is new data:

                status = g_io_channel_read_chars (source,
                                                  (char *) &event,
                                                  sizeof(event),
                                                  &read,
                                                  NULL);

                while (status == G_IO_STATUS_NORMAL && read == sizeof(event)) {

So with my fix to turn off buffering all should be well now, but we
(g-s-d folks) should consider doing a follow-up patch to

plugins/rfkill/rfkill-glib.c

Replacing all the sizeof(event) occurences with RFKILL_EVENT_SIZE_V1 for
future proofing against rfkill.h changes (e.g. if someone syncs
the private copy with the kernel) although this should be safe with
the recent kernel-header change...

Benjamin, what is your take on doing a  s/sizeof(event)/RFKILL_EVENT_SIZE_V1/ ?

At a minimum that would be good to do because it fixes this weirdness:

                len = read(fd, &event, sizeof(event));
                if (len < 0) {
                        if (errno == EAGAIN)
                                break;
                        g_debug ("Reading of RFKILL events failed");
                        break;
                }

                if (len != RFKILL_EVENT_SIZE_V1) {
                        g_warning ("Wrong size of RFKILL event\n");
                        continue;
                }

Where the len param passed to read() and the one checked for are
not necessarily the same.

Regards,

Hans

> 
>> And g-s-d was compiled with headers
>> where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
>> would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
>> out of the 16 bytes it got to fill a single rfkill_event which is fine,
>> except that it uses the remaining 7 bytes as the first 7 bytes of the
>> next rfkill_event which it processes making that next event be completely
>> bogus.
> 
> I understand the issue. I just didn't expect it would get solved when
> the headers contain a struct rkfill_event of size 8 again, since the
> kernel would still return 9 bytes. In effect, you'd still get the same
> issue, because the read in g_io_channel_fill_buffer() is 1024
> (G_IO_NICE_BUF_SIZE).
> 
>>
>> I do believe this really is a g-s-d bug though, it should not have
>> been using a "buffered" gio-channel on a /dev/foo node; and so far it
>> only got away with this by the rfkill_event size being a nice power of 2
>> value.
> 
> I agree, but ... kernel regressions and all that, right?
> 
>> As I mentioned in an email to Benjamin, g-s-d should really switch
>> to making direct read() calls on the fd circumventing the gio-channel
>> read code all together:
>>
>> "Right, notice I just realized that even after my fix there still is an issue,
>> when running code compiled against new kernel headers gsd-rfkill will now
>> always expect 9 bytes per event. But when running on an older kernel that
>> will not be true.
>>
>> To fix this the code should probably just directly call read() itself
>> on the fd (only using g_io_channel for the polling) and then accept anything
>> between 8 bytes and sizeof(struct rfkill_event) as valid return value from the
>> read() call..."
> 
> I don't think this is related to gio, you can still use
> g_io_channel_read_chars() since that just does the read for you:
> 
>   if (!channel->use_buffer)
>     {
>       gsize tmp_bytes;
> 
>       g_assert (!channel->read_buf || channel->read_buf->len == 0);
> 
>       status = channel->funcs->io_read (channel, buf, count, &tmp_bytes, error);
> 
>       if (bytes_read)
>         *bytes_read = tmp_bytes;
> 
>       return status;
>     }
> 
> 
> but checking that you actually got the *exact* size that you wanted is
> indeed wrong (as well).
> 
> Sounds like I made the wrong call here - I was discussing this with
> Benjamin a while ago and decided *not* to add an ioctl to opt in to the
> new event type ... sounds like I should have.
> 
> 
> Notice that the problem which I described there will go away when compiled
> against even newer kernel-headers where sizeof(struct rfkill_event) is
> back to 8 again.
> 
> Yes.
> 
> So question, for code like g-s-d, which does not care about the new
> hard_block_reasons field. What would be the preferred / max compatibility
> way to do the reads. Also keeping in mind that there are "bad" kernel
> headers out there where sizeof(struct rfkill_event) == 9 ?
> 
> I think that this would be best:
> 
> 	ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
> 	if (ret == RFKILL_EVENT_SIZE_V1) {
> 		/* Valid event process it */
> 	}
> 
> This should produce the same code regardless of the kernel-headers version
> and should work on both old and new kernels, correct ?
> 
> 
> I believe so, yes.
> 
> I don't think that rolling back the new extended-event support altogether
> will help. Since this has been out there for 2 released kernel versions
> now, I believe the best way to fix this is to fix userspace; and to fix
> userspace in such a way (at least g-s-d) that this problem cannot
> happen again.
> 
> Fair enough.
> 
> Thanks,
> johannes
>