Re: [PATCH v1] wifi: mac80211: Initialize EWMA fail avg to 1

Johannes Berg <johannes@xxxxxxxxxxxxxxxx> · Fri, 21 Apr 2023 11:35:03 +0200

On Thu, 2023-04-20 at 15:15 +0200, Felix Fietkau wrote:
> 
> > On 20. Apr 2023, at 15:00, Benjamin Beichler <Benjamin.Beichler@xxxxxxxxxxxxxx> wrote:
> > 
> > Am 20.04.2023 um 14:22 schrieb Felix Fietkau:
> > > > On 20.04.23 12:27, Johannes Berg wrote:
> > > > On Thu, 2023-04-20 at 11:30 +0200, Benjamin Beichler wrote:
> > > > > > To me, basically, I see two ways to solve this:
> > > > > > 
> > > > > > 1) we have DECLARE_EWMA_ZERO_VALID() or something like that which
> > > > > >     *doesn't* treat 0 as an uninitialized value, and either has a
> > > > > >     separate "not initialized yet" bit (but that's iffy storage wise),
> > > > > >     or simply has another argument to _init() for the initial value or
> > > > > >     so.
> > > > > > 
> > > > > > 2) you don't just don't use 0 and 100 but say 1 and 100, that results in
> > > > > >     basically the same behaviour, but avoids the special 0.
> > > > > > 
> > > > > > johannes
> > > > > 
> > > > > I also ran into that problem in the past, and reviewing it again with a
> > > > > college, I think, this is a real bug in the EWMA implementation. I try
> > > > > to provide a proper patch in the next days, but actually the EWMA
> > > > > handles the internal value zero, always like in the initialization,
> > > > > which is wrong, e.g., for positive/negative averaged values.
> > > > 
> > > > Yes, it's always wrong as long as you feed it something zero, or values
> > > > with different sign.
> > > > 
> > > > For a lot of use cases, however, that doesn't matter. Originally, it was
> > > > used e.g. for signal strength averaging, average packet lengths, etc.
> > > > where it really doesn't matter since you can never use 0 or values that
> > > > have different sign.
> > > > 
> > > > > A quick research shows, this bug is since the first implementation of
> > > > > the ewma in the code ...
> > > > > 
> > > > 
> > > > Yeah, I'm aware of that, I was around for it ;-)
> > > > 
> > > > But see above, I'm not sure I'd even call it a bug, at least not
> > > > originally with the users that we had intended.
> > > > 
> > > > Hence I don't know if it's really good to fix this in general - for many
> > > > of these cases zero can still be treated specially (and like I mentioned
> > > > in my previous email, we can even here avoid 0), and then we don't spend
> > > > an extra byte (or likely 4) to hold a "first time" flag.
> > > > 
> > > > Dunno. Maybe it's not worth thinking about the extra memory space vs.
> > > > the extra maintenance cost. But maybe at least on 64-bit we could steal
> > > > a bit from the unsigned long? Not sure what all the users are..
> > > We don't actually need a full bit. We can just add 1 to the internal
> > > value for initialized values. How about this (completely untested):
> > > https://nbd.name/p/69b00c5b
> > > 
> > Nice hack, but even a bit more dirty than before :-D
> 
> I disagree. With my “hack” at least the implementation will do the
> right thing without the API user having to worry about 0 as a value.
> It could use some comments to clarify the details, but I do think it
> makes things easier.
> 

To me, the first question is if there are potentially any users that are
_relying_ on the current behaviour. This seems unlikely though, looking
at the ~30 users, most sound like signal/rssi, packet sizes, etc.

So let's say with the bug found here that prompted this patch, chances
are that there aren't any users that really want 0 to be special. I also
can't even really think of a reason for wanting that.

So then let's say we want to fix the existing code. I can think of these
possible ways:

 * splitting off a bit for initialized from the unsigned long
   (which at least for 64-bit should be OK since presumably most code
    using this will run on 32-bit systems too)
 * adding another value for it, e.g. making it u32 and adding a bool for
   "first value"
 * biasing the value, like Felix proposes, could be by 1 or -1 for
   example

All of these have a memory cost, of course, though the first two are
data and the second code, so for things like stations the code exists
only once and the data multiple times. On 64-bit we can probably make
the first two not have a data memory cost though.

As for biasing the value, couldn't that lead to a similar problem? It's
clearly less likely that the end of the range is reached rather than
zero, but still?

johannes