Hi Felix, On Thursday, January 26, 2017 11:26:03 AM CET Felix Fietkau wrote: > On 2017-01-26 11:15, Simon Wunderlich wrote: > > Hey, > > > > On Thursday, January 26, 2017 11:02:53 AM CET Felix Fietkau wrote: > >> On 2017-01-26 10:50, Simon Wunderlich wrote: > >> > Hey Felix, > >> > > >> > On Wednesday, January 25, 2017 5:36:53 PM CET Felix Fietkau wrote: > >> >> Various chips occasionally run into a state where the tx path still > >> >> appears to be working normally, but the rx path is deaf. > >> >> > >> >> There is no known register signature to check for this state > >> >> explicitly, > >> >> so use the lack of rx interrupts as an indicator. > >> >> > >> >> This detection is prone to false positives, since a device could also > >> >> simply be in an environment where there are no frames on the air. > >> >> However, in this case doing a reset should be harmless since it's > >> >> obviously not interrupting any real activity. To avoid confusion, call > >> >> the reset counters in this case "Rx path inactive" instead of > >> >> something > >> >> like "Rx path deaf", since it may not be an indication of a real > >> >> hardware failure. > >> >> > >> >> Signed-off-by: Felix Fietkau <nbd@xxxxxxxx> > >> > > >> > As we observed in the field, it may happen that there are still RX > >> > interrupts triggered, but just a very low number - in which case I > >> > believe your version wouldn't fix the problem. Therefore we had a > >> > threshold in our original patch [1]. > >> > >> It seems that you were seeing something different than what I was seeing > >> in my tests. Though it could be that my issues were actually caused by > >> something else. I had queued up these changes a while back before I > >> finally found and fixed the IRQ issue. > > > > What we found a good threshold was to check for less than 1 RX interrupt > > per second, and check the mean average (about) every 30 seconds. If there > > is any other AP or a station connected, it will not reset the chip, and > > also there will be no reset on short outages. > > But if there's less than 1 Rx interrupt per second, then my patch should > also trigger, right? yes, that function you hooked in is called once a second. However, this will likely lead to one reset per second one a "lonely" access point, which could create problems for clients connecting the first time, or power-saving clients who don't talk much. It's not so unlikely that an AP will not hear anything for a full second, and the reset puts it out of operation for some time, too. (Not sure how much beacons etc are affected, for example) If you can check only every 30 seconds on the average, you would reduce this problem. Cheers, Simon
Attachment:
signature.asc
Description: This is a digitally signed message part.