Re: [libgpiod] gpiomon loses events

Kent Gibson <warthog618@xxxxxxxxx> · Wed, 16 Sep 2020 17:57:34 +0800




On Wed, Sep 16, 2020 at 11:29:00AM +0200, Bartosz Golaszewski wrote:
> On Wed, Sep 16, 2020 at 2:27 AM Kent Gibson <warthog618@xxxxxxxxx> wrote:
> >
> > On Tue, Sep 15, 2020 at 10:34:31AM +0300, Maxim Devaev wrote:
> > > > The bug was introduced in libgpiod v1.5 so, depending on your
> > > > circumstances, I would revert to an earlier libgpiod or apply my patch.
> > > > ...
> > >
> > > Is this behavior documented somewhere? It's a complete surprise to me
> > > that this is how it works. I expected to lose the old events. It seems
> > > to me that for software that catches edge, the loss of new events is a
> > > serious problem, since it can lead to a desynchronization of the
> > > physical state of the pin and the user's information about it. For
> > > example, if event 16 was falling and event 17 was rising, and the
> > > signal stopped changing and remains at 1, the kernel will tell us that
> > > it was only falling (i.e. 0), while the real state will be 1.
> > >
> > > If we lose events in any case, then in my opinion it is much more
> > > important to keep the current state, not the past. I can't think of a
> > > case where the loss of old events can lead to problems, but the
> > > desynchronization of the current state actually means that the
> > > software can make the wrong decision in its logic based on the
> > > driver's lies. Yes, this would be a breaking change, but it seems to
> > > me that it is the current behavior that is incorrect. Don't get me
> > > wrong, I don't insist on it. If this decision was made for certain
> > > reasons, I would like to understand where I am wrong.
> > >
> >
> > I agree - it makes more sense to discard the older events.
> > The existing behaviour pre-dates me, so I'm not sure if it is
> > intentional and if so what the rationale for it is.
> >
> 
> While it predates me too (Linus: any particular reason to do it like
> this?) I think that requesting events from user-space is a contract
> where the user-space program commits to reading the events fast enough
> to avoid this kind of overflow. In V2 we can adjust the size of the
> queue to make it bigger if the process isn't capable of consuming all
> the data as they come.
> 

For sure, but if there is an overflow for whatever reason - maybe they
need to debounce ;-) - then it would be preferable for the final event
to correspond to the current state.

> > And I'm still trying to think of a case where it would be harmful to
> > change this behaviour - what could it break?
> >
> 
> Well, I wouldn't change it in V1 but since V2 is a new thing - I think
> it should be relatively straightforward right? If we see the kfifo is
> full, we should simply consume the oldest event on the kernel side,
> drop it and add in the new one. Maybe worth considering for v9? I
> don't see any cons of this and this behavior is quite reasonable.
> 

It is pretty straight forward - my current patch for this looks like:

@@ -537,9 +537,15 @@ static irqreturn_t edge_irq_thread(int irq, void *p)
        le.seqno = (lr->num_lines == 1) ? le.line_seqno : line->req_seqno;
        le.offset = gpio_chip_hwgpio(line->desc);

-       ret = kfifo_in_spinlocked_noirqsave(&lr->events, &le,
-                                           1, &lr->wait.lock);
-       if (ret)
+       overflow = false;
+       spin_lock(&lr->wait.lock);
+       if (kfifo_is_full(&lr->events)) {
+               overflow = true;
+               kfifo_skip(&lr->events);
+       }
+       kfifo_in(&lr->events, &le, 1);
+       spin_unlock(&lr->wait.lock);
+       if (!overflow)
                wake_up_poll(&lr->wait, EPOLLIN)

I'll incorporate that into v9.

Cheers,
Kent.