Hi, On Thu, May 12, 2022 at 10:33 AM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > Hi Alexander, > > alex.aring@xxxxxxxxx wrote on Sun, 1 May 2022 20:21:18 -0400: > > > Hi, > > > > On Thu, Apr 28, 2022 at 3:58 AM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > > > > > Hi Alexander, > > > > > > alex.aring@xxxxxxxxx wrote on Wed, 27 Apr 2022 14:01:25 -0400: > > > > > > > Hi, > > > > > > > > On Wed, Apr 27, 2022 at 12:47 PM Miquel Raynal > > > > <miquel.raynal@xxxxxxxxxxx> wrote: > > > > > > > > > > We should never start a transmission after the queue has been stopped. > > > > > > > > > > But because it might work we don't kill the function here but rather > > > > > warn loudly the user that something is wrong. > > > > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> > > > > > --- > > > > > > [...] > > > > > > > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > > > > index a8a83f0167bf..021dddfea542 100644 > > > > > --- a/net/mac802154/tx.c > > > > > +++ b/net/mac802154/tx.c > > > > > @@ -124,6 +124,8 @@ bool ieee802154_queue_is_held(struct ieee802154_local *local) > > > > > static netdev_tx_t > > > > > ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > > { > > > > > + WARN_ON_ONCE(ieee802154_queue_is_stopped(local)); > > > > > + > > > > > return ieee802154_tx(local, skb); > > > > > } > > > > > > > > > > diff --git a/net/mac802154/util.c b/net/mac802154/util.c > > > > > index 847e0864b575..cfd17a7db532 100644 > > > > > --- a/net/mac802154/util.c > > > > > +++ b/net/mac802154/util.c > > > > > @@ -44,6 +44,24 @@ void ieee802154_stop_queue(struct ieee802154_local *local) > > > > > rcu_read_unlock(); > > > > > } > > > > > > > > > > +bool ieee802154_queue_is_stopped(struct ieee802154_local *local) > > > > > +{ > > > > > + struct ieee802154_sub_if_data *sdata; > > > > > + bool stopped = true; > > > > > + > > > > > + rcu_read_lock(); > > > > > + list_for_each_entry_rcu(sdata, &local->interfaces, list) { > > > > > + if (!sdata->dev) > > > > > + continue; > > > > > + > > > > > + if (!netif_queue_stopped(sdata->dev)) > > > > > + stopped = false; > > > > > + } > > > > > + rcu_read_unlock(); > > > > > + > > > > > + return stopped; > > > > > +} > > > > > > > > sorry this makes no sense, you using net core functionality to check > > > > if a queue is stopped in a net core netif callback. Whereas the sense > > > > here for checking if the queue is really stopped is when 802.15.4 > > > > thinks the queue is stopped vs net core netif callback running. It > > > > means for MLME-ops there are points we want to make sure that net core > > > > is not handling any xmit and we should check this point and not > > > > introducing net core functionality checks. > > > > > > I think I've mixed two things, your remark makes complete sense. I > > > should instead here just check a 802.15.4 internal variable. > > > > > > > I am thinking about this patch series... and I think it still has bugs > > or at least it's easy to have bugs when the context is not right > > prepared to call a synchronized transmission. We leave here the netdev > > state machine world for transmit vs e.g. start/stop netif callback... > > We have a warning here if there is a core netif xmit callback running > > when 802.15.4 thinks it shouldn't (because we take control of it) but > > I also think about a kind of the other way around. A warning if > > 802.15.4 transmits something but the netdev core logic "thinks" it > > shouldn't. > > > > That requires some checks (probably from netcore functionality) to > > check if we call a 802.15.4 sync xmit but netif core already called > > stop() callback. The last stop() callback - means the driver_ops > > stop() callback was called, we have some "open_count" counter there > > which MUST be incremented before doing any looping of one or several > > sync transmissions. All I can say is if we call xmit() but the driver > > is in stop() state... it will break things. > > > > My concern is also here that e.g. calling netif down or device > > suspend() are only two examples I have in my mind right now. I don't > > know all cases which can occur, that's why we should introduce another > > WARN_ON_ONCE() for the case that 802.15.4 transmits something but we > > are in a state where we can't transmit something according to netif > > state (driver ops called stop()). > > > > Can you add such a check as well? > > That is a good idea, I have added such a check: if the interface is > supposed to be down I'll warn and return because I don't think there is > much we can do in this situation besides avoiding trying to transmit > anything. > ok... > > And please keep in mind to increment > > the open count when implementing MLME-ops (or at least handle it > > somehow), otherwise I guess it's easy to hit the warning. If another > > user reports warnings and tells us what they did we might know more > > other "cases" to fix. > > I don't think incrementing the open_count counter is the right solution > here just because the stop call is not supposed to fail and has no > straightforward ways to be deferred. In particular, just keeping the > open_count incremented will just avoid the actual driver stop operation > to be executed and the core will not notice it. > the stop callback can sleep, it's the job of the driver to synchronize it somehow with the transceiver state. > I came out with another solution: acquiring the rtnl when performing a > MLME Tx operation to serialize these operations. We can easily have a > version which just checks the rtnl was acquired as well for situations > when the MLME operations are called by eg. the nl layer (and thus, with > the rtnl lock taken automatically). The rtnl lock needs definitely to be held during such operation. - Alex