On Fri, Nov 1, 2024 at 3:59 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 31 Oct 2024 08:37:06 +0200 Alexandru Ardelean <aardelean@xxxxxxxxxxxx> wrote: > > > A bug was found in the find_closest() (find_closest_descending() is also > > affected after some testing), where for certain values with small > > progressions, the rounding (done by averaging 2 values) causes an incorrect > > index to be returned. > > Please help us understand the userspace-visible effects of this bug. > > Do you believe the bug is sufficiently serious to justify backporting > these fixes into earlier kernel versions? If so, are you able to help > us identify a suitable Fixes: target? Oh right. Apologies. I keep forgetting the Fixes tag. Added below. I can also do a V2. Please advise on what's preferred. I'll also admit that my attempt at explaining the bug (in a general way) looks a bit wonky. I'll try to re-formulate the bug description for a V2. ------------------------------------------------------------------------------------------- For this reply, maybe I'll try a more "timeline" approach (for explaining the bug). I'll apologize for the amount of text I posted here, but this bug is one-of-those-LOTR-anthologies-trying-to-explain The bug was found while testing the 'drivers/iio/adc/ad7606.c' driver; particularly the oversampling setting (of the driver). Taking as reference the oversampling table (from the driver): static const unsigned int ad7606_oversampling_avail[7] = { 1, 2, 4, 8, 16, 32, 64, }; When doing: $ echo 1 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is fine [1] $ echo 2 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is wrong; 2 should be returned here $ echo 3 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 2 # this is fine $ echo 4 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 4 # this is fine $ echo 5 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 4 # this is fine And from here-on, the bug goes away. i.e. one gets the values as one would expect. The bug no longer happens as in the case of [1], as the difference between 2 values (in the array) increases enough, such that the averaging (of 2 values) no longer causes issues. Initially, the assumption was that this bug happens only for searching exact values when an array is monotonic with a progression of 1. (One could argue that find_closest() is not intended for arrays of 1,2,3,4,...N). While trying to create a fix for this issue, I started writing a kunit test. That produced some other quirks, but not as bad as [1]. For example, this array from 'drivers/hwmon/ina2xx.c' & 'drivers/iio/adc/ina2xx-adc.c' drivers. const int ina226_avg_tab[] = { 1, 4, 16, 64, 128, 256, 512, 1024 }; While running the kunit test (for 'ina226_avg_tab'): * idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, so value 1 * idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, value 1; and now one could argue whether 3 is closer to 4 or to 1. This quirk only appears for value '3' in this array. And from here-on the current find_closest() works fine (one gets what one would expect). ------------------------------------------------------------------------------------------- The above is one way of trying to explain the bug. The other way I am trying to explain this bug is through the kunit in the 2nd patch of this series. But to answer one question above: this bug exists since the first introduction of "find_closest()". Circa kernel version 4.1 And it only affects several corner cases (of arrays). Fixes: 95d119528b0b ("util_macros.h: add find_closest() macro") > > Thanks.