Hi Wey > activity is too small time for device. Moreover we have unlikely but > possible situation when device is fully functional, but read_ptr will > wrap by accident to q->last_read_ptr on every check. > > I think, better solution would be something like in rt2x00 or in > net/sched/sch_generic.c (however rt2x00 is easier to understand). It is > based on time stamp. When we get tx complete notification from hardware > (and incise read_ptr) mark the time stamp. In watchdog, which tick > periodically, check if queue is not empty and if current time is > bigger than time_stamp + time_out, if it is - firmware hung. More > smaller watchog tick give more precise hung detect (with disadvantage > of more cpu usage). > > > Me too not really like the current "monitor" approach, some thought about the design you propose. > > 1. "time_out" is something need to be define and has the similar problem like what we have today since different devices has different behavior. For example, in WiFi/BT combo case, the queue might not move for a while if BT traffic load is high Sure. However new watchdog could be more precise. Currently if hung will happen just after watchdog tick we are detecting it in time about 2 ticks i.e. 10s, or when happen just before the tick we detect the hang in 1 tick i.e. 5s, what gives 100% inaccuracy. New design can be much more precise. > 2. I don't really see much of "cpu usage" impact if we have a reasonable watchdog timer. But it is all relative. Ohh, I was talking about cpu usage in new design I described. > > By saying that, I think using timestamp might give more cleaner design, but still has the similar issues. Ok, if Intel have no plan to change the monitor recovery and have nothing against my watchdog approach, I'm going to cook some patches. Stanislaw -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html