Hi, On 4/14/21 11:47 AM, Johannes Berg wrote: > Hi Hans, > >> Adding Benjamin Berg who is one of the gnome-settings-daemon >> maintainers to the Cc. > > :) > As you might imagine, I've talked to him ;-) >> >> kernel-headers-5.11.11-300.fc34.x86_64 > > Right, not yet. > >> Meaning that basically all of Fedora 34 has been built with the "bad" >> headers. > > Yay. But still, I think in this case it wouldn't help. > >>>> Userspace that is interested in the new byte will read 9 bytes. >>> >>> Which, unfortunately, doesn't address *this* particular case, because it >>> uses gio and that will fill the buffer with arbitrary size? >>> >>> When you (Hans) say you saw in strace a read of size 8, did you mean the >>> size passed to it, or the return size? I guess it must be the return >>> size, and the size passed to it was way larger. >> >> No for some reason, for some later read calls gio was actually passing >> 8 as size to the read() syscall. >> > > Wait, that's weird? The (latest) gio code just calls > g_io_channel_fill_buffer() and that doesn't care about the size you > passed to g_io_channel_read_chars(), in fact it isn't even passed down, > as you might expect for stream reads? Ok, so I took another look at the strace and you are right. There are 2 things happening here: 1. g-s-d has a private rfkill.h userspace-api copy with the old 8 byte rfkill_event definition, which I missed. 2. g-s-d mixes read() and g_io_channel_read_chars() usage, using read() to get the initial state and then later on using g_io_channel_read_chars() on events. The initial state read()s are the read(fd, buf, 8) calls which I saw in the strace. So g-s-d was using / assuming a 8 byte struct size all the time and my off-by-one by using buffered io theory was right, but works differently then I expected. As you (Johannes) said, the problem is g-s-d making read(fd, buf, 1024) calls, reading 9 bytes per event and then returning 8 to the g_io_channel_read_chars() saving the left over byte and pre-pending it to the next event, ruining the next event(s). Actually the code for getting the initial state seems to do the same sizeof / RFKILL_EVENT_SIZE_V1 mix/match as systemd is doing (probably copy and pasted from one to the other), but it is "saved" somewhat by having its own rfkill.h copy: len = read(fd, &event, sizeof(event)); if (len < 0) { if (errno == EAGAIN) break; g_debug ("Reading of RFKILL events failed"); break; } if (len != RFKILL_EVENT_SIZE_V1) { g_warning ("Wrong size of RFKILL event\n"); continue; } And then later on, when poll says there is new data: status = g_io_channel_read_chars (source, (char *) &event, sizeof(event), &read, NULL); while (status == G_IO_STATUS_NORMAL && read == sizeof(event)) { So with my fix to turn off buffering all should be well now, but we (g-s-d folks) should consider doing a follow-up patch to plugins/rfkill/rfkill-glib.c Replacing all the sizeof(event) occurences with RFKILL_EVENT_SIZE_V1 for future proofing against rfkill.h changes (e.g. if someone syncs the private copy with the kernel) although this should be safe with the recent kernel-header change... Benjamin, what is your take on doing a s/sizeof(event)/RFKILL_EVENT_SIZE_V1/ ? At a minimum that would be good to do because it fixes this weirdness: len = read(fd, &event, sizeof(event)); if (len < 0) { if (errno == EAGAIN) break; g_debug ("Reading of RFKILL events failed"); break; } if (len != RFKILL_EVENT_SIZE_V1) { g_warning ("Wrong size of RFKILL event\n"); continue; } Where the len param passed to read() and the one checked for are not necessarily the same. Regards, Hans > >> And g-s-d was compiled with headers >> where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d >> would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read >> out of the 16 bytes it got to fill a single rfkill_event which is fine, >> except that it uses the remaining 7 bytes as the first 7 bytes of the >> next rfkill_event which it processes making that next event be completely >> bogus. > > I understand the issue. I just didn't expect it would get solved when > the headers contain a struct rkfill_event of size 8 again, since the > kernel would still return 9 bytes. In effect, you'd still get the same > issue, because the read in g_io_channel_fill_buffer() is 1024 > (G_IO_NICE_BUF_SIZE). > >> >> I do believe this really is a g-s-d bug though, it should not have >> been using a "buffered" gio-channel on a /dev/foo node; and so far it >> only got away with this by the rfkill_event size being a nice power of 2 >> value. > > I agree, but ... kernel regressions and all that, right? > >> As I mentioned in an email to Benjamin, g-s-d should really switch >> to making direct read() calls on the fd circumventing the gio-channel >> read code all together: >> >> "Right, notice I just realized that even after my fix there still is an issue, >> when running code compiled against new kernel headers gsd-rfkill will now >> always expect 9 bytes per event. But when running on an older kernel that >> will not be true. >> >> To fix this the code should probably just directly call read() itself >> on the fd (only using g_io_channel for the polling) and then accept anything >> between 8 bytes and sizeof(struct rfkill_event) as valid return value from the >> read() call..." > > I don't think this is related to gio, you can still use > g_io_channel_read_chars() since that just does the read for you: > > if (!channel->use_buffer) > { > gsize tmp_bytes; > > g_assert (!channel->read_buf || channel->read_buf->len == 0); > > status = channel->funcs->io_read (channel, buf, count, &tmp_bytes, error); > > if (bytes_read) > *bytes_read = tmp_bytes; > > return status; > } > > > but checking that you actually got the *exact* size that you wanted is > indeed wrong (as well). > > Sounds like I made the wrong call here - I was discussing this with > Benjamin a while ago and decided *not* to add an ioctl to opt in to the > new event type ... sounds like I should have. > > > Notice that the problem which I described there will go away when compiled > against even newer kernel-headers where sizeof(struct rfkill_event) is > back to 8 again. > > Yes. > > So question, for code like g-s-d, which does not care about the new > hard_block_reasons field. What would be the preferred / max compatibility > way to do the reads. Also keeping in mind that there are "bad" kernel > headers out there where sizeof(struct rfkill_event) == 9 ? > > I think that this would be best: > > ret = read(fd, &event, RFKILL_EVENT_SIZE_V1); > if (ret == RFKILL_EVENT_SIZE_V1) { > /* Valid event process it */ > } > > This should produce the same code regardless of the kernel-headers version > and should work on both old and new kernels, correct ? > > > I believe so, yes. > > I don't think that rolling back the new extended-event support altogether > will help. Since this has been out there for 2 released kernel versions > now, I believe the best way to fix this is to fix userspace; and to fix > userspace in such a way (at least g-s-d) that this problem cannot > happen again. > > Fair enough. > > Thanks, > johannes >