Hi, On Wed, 2021-04-14 at 10:17 +0200, Hans de Goede wrote: > Hi, > > Adding Benjamin Berg who is one of the gnome-settings-daemon > maintainers to the Cc. > > On 4/14/21 9:07 AM, Johannes Berg wrote: > > On Wed, 2021-04-14 at 05:12 +0000, Grumbach, Emmanuel wrote: > > > > > > > > Hi, > > > > > > > > I've been debugging a userspace rfkill issue today which boils > > > > down > > > > to the > > > > "rfkill: add a reason to the HW rfkill state" patch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i > > > > d=14486c82612a177cb910980c70ba900827ca0894 > > > > breaking userspace. > > > > > > This has been rolled back by: > > > https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6 > > > Other userspace broke (systemd) so Johannes rolled this back by > > > default. > > I see, but this change is not in: > > kernel-headers-5.11.11-300.fc34.x86_64 > > Meaning that basically all of Fedora 34 has been built with the "bad" > headers. > > > > Userspace that is interested in the new byte will read 9 bytes. > > > > Which, unfortunately, doesn't address *this* particular case, because > > it > > uses gio and that will fill the buffer with arbitrary size? > > > > When you (Hans) say you saw in strace a read of size 8, did you mean > > the > > size passed to it, or the return size? I guess it must be the return > > size, and the size passed to it was way larger. > > No for some reason, for some later read calls gio was actually passing > 8 as size to the read() syscall. And g-s-d was compiled with headers > where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d > would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read > out of the 16 bytes it got to fill a single rfkill_event which is fine, > except that it uses the remaining 7 bytes as the first 7 bytes of the > next rfkill_event which it processes making that next event be > completely bogus. > > I do believe this really is a g-s-d bug though, it should not have > been using a "buffered" gio-channel on a /dev/foo node; and so far it > only got away with this by the rfkill_event size being a nice power of > 2 value. > > As I mentioned in an email to Benjamin, g-s-d should really switch > to making direct read() calls on the fd circumventing the gio-channel > read code all together: > > "Right, notice I just realized that even after my fix there still is an > issue, when running code compiled against new kernel headers gsd-rfkill > will now always expect 9 bytes per event. But when running on an older > kernel that will not be true. This confuses me. i.e. if g-s-d is compiled against headers where the struct is 9 bytes long, then we should just be getting a short read of 8 bytes. And that should be the same, whether we use plain read() or g_io_channel_read_chars(). I was under the impression that with https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/232 merged, the read should be fine on the GNOME side. That said, the g-s-d event handler is checking the size of the read against the struct size. This is obviously wrong, it should only check that the read was successful (or check for >= V1 size). Benjamin > To fix this the code should probably just directly call read() itself > on the fd (only using g_io_channel for the polling) and then accept > anything > between 8 bytes and sizeof(struct rfkill_event) as valid return value > from the > read() call..." > > Notice that the problem which I described there will go away when > compiled > against even newer kernel-headers where sizeof(struct rfkill_event) is > back to 8 again. > > So question, for code like g-s-d, which does not care about the new > hard_block_reasons field. What would be the preferred / max > compatibility > way to do the reads. Also keeping in mind that there are "bad" kernel > headers out there where sizeof(struct rfkill_event) == 9 ? > > I think that this would be best: > > ret = read(fd, &event, RFKILL_EVENT_SIZE_V1); > if (ret == RFKILL_EVENT_SIZE_V1) { > /* Valid event process it */ > } > > This should produce the same code regardless of the kernel-headers > version > and should work on both old and new kernels, correct ? > > > The commit Emmanuel linked to fixes cases such as systemd that were > > just > > completely garbage (reading with one size, and then checking they got > > another), but it wouldn't fix this case. > > > > Unfortunately, as you also said, it does seem a bit late now - it's > > been > > released in various kernels since 5.10, and while the default > > rollback > > will improve the situation somewhat, read(..., size>8) will still > > return > > 9 bytes rather than 8 as it used to. Switching that *also* back > > *should* > > be safe, but who knows what other bugs were introduced in the > > meantime? > > > > I certainly don't really have a major objection to rolling that also > > back, but would it really help that much at this point? I guess it > > could > > be going into 5.10/5.11 stable kernels though. > > I don't think that rolling back the new extended-event support > altogether > will help. Since this has been out there for 2 released kernel versions > now, I believe the best way to fix this is to fix userspace; and to fix > userspace in such a way (at least g-s-d) that this problem cannot > happen again. > > Regards, > > Hans >
Attachment:
signature.asc
Description: This is a digitally signed message part