Hi Alexander, alex.aring@xxxxxxxxx wrote on Sun, 20 Feb 2022 18:49:06 -0500: > Hi, > > On Mon, Feb 7, 2022 at 9:48 AM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > > > Right now we are able to stop a queue but we have no indication if a > > transmission is ongoing or not. > > > > Thanks to recent additions, we can track the number of ongoing > > transmissions so we know if the last transmission is over. Adding on top > > of it an internal wait queue also allows to be woken up asynchronously > > when this happens. If, beforehands, we marked the queue to be held and > > stopped it, we end up flushing and stopping the tx queue. > > > > Thanks to this feature, we will soon be able to introduce a synchronous > > transmit API. > > > > Signed-off-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> > > --- > > include/net/cfg802154.h | 1 + > > net/ieee802154/core.c | 1 + > > net/mac802154/cfg.c | 5 ++--- > > net/mac802154/ieee802154_i.h | 1 + > > net/mac802154/tx.c | 11 ++++++++++- > > net/mac802154/util.c | 3 ++- > > 6 files changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h > > index 043d8e4359e7..0d385a214da3 100644 > > --- a/include/net/cfg802154.h > > +++ b/include/net/cfg802154.h > > @@ -217,6 +217,7 @@ struct wpan_phy { > > /* Transmission monitoring and control */ > > atomic_t ongoing_txs; > > atomic_t hold_txs; > > + wait_queue_head_t sync_txq; > > > > char priv[] __aligned(NETDEV_ALIGN); > > }; > > diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c > > index de259b5170ab..0953cacafbff 100644 > > --- a/net/ieee802154/core.c > > +++ b/net/ieee802154/core.c > > @@ -129,6 +129,7 @@ wpan_phy_new(const struct cfg802154_ops *ops, size_t priv_size) > > wpan_phy_net_set(&rdev->wpan_phy, &init_net); > > > > init_waitqueue_head(&rdev->dev_wait); > > + init_waitqueue_head(&rdev->wpan_phy.sync_txq); > > > > return &rdev->wpan_phy; > > } > > diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c > > index e8aabf215286..da94aaa32fcb 100644 > > --- a/net/mac802154/cfg.c > > +++ b/net/mac802154/cfg.c > > @@ -46,8 +46,7 @@ static int ieee802154_suspend(struct wpan_phy *wpan_phy) > > if (!local->open_count) > > goto suspend; > > > > - atomic_inc(&wpan_phy->hold_txs); > > - ieee802154_stop_queue(&local->hw); > > + ieee802154_sync_and_stop_tx(local); > > synchronize_net(); > > > > /* stop hardware - this must stop RX */ > > @@ -73,7 +72,7 @@ static int ieee802154_resume(struct wpan_phy *wpan_phy) > > return ret; > > > > wake_up: > > - if (!atomic_dec_and_test(&wpan_phy->hold_txs)) > > + if (!atomic_read(&wpan_phy->hold_txs)) > > ieee802154_wake_queue(&local->hw); > > local->suspended = false; > > return 0; > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h > > index 56fcd7ef5b6f..295c9ce091e1 100644 > > --- a/net/mac802154/ieee802154_i.h > > +++ b/net/mac802154/ieee802154_i.h > > @@ -122,6 +122,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan; > > > > void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb); > > void ieee802154_xmit_sync_worker(struct work_struct *work); > > +void ieee802154_sync_and_stop_tx(struct ieee802154_local *local); > > netdev_tx_t > > ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev); > > netdev_tx_t > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c > > index abd9a057521e..06ae2e6cea43 100644 > > --- a/net/mac802154/tx.c > > +++ b/net/mac802154/tx.c > > @@ -47,7 +47,8 @@ void ieee802154_xmit_sync_worker(struct work_struct *work) > > ieee802154_wake_queue(&local->hw); > > > > kfree_skb(skb); > > - atomic_dec(&local->phy->ongoing_txs); > > + if (!atomic_dec_and_test(&local->phy->ongoing_txs)) > > + wake_up(&local->phy->sync_txq); > > netdev_dbg(dev, "transmission failed\n"); > > } > > > > @@ -117,6 +118,14 @@ ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb) > > return ieee802154_tx(local, skb); > > } > > > > +void ieee802154_sync_and_stop_tx(struct ieee802154_local *local) > > +{ > > + atomic_inc(&local->phy->hold_txs); > > + ieee802154_stop_queue(&local->hw); > > + wait_event(local->phy->sync_txq, !atomic_read(&local->phy->ongoing_txs)); > > + atomic_dec(&local->phy->hold_txs); > > In my opinion this _still_ races as I mentioned earlier. You need to > be sure that if you do ieee802154_stop_queue() that no ieee802154_tx() > or hot_tx() is running at this time. Look into the function I > mentioned earlier "?netif_tx_disable()?". I think now I get the problem, but I am having troubles understanding the logic in netif_tx_disable(), or should I say, the idea that I should adapt to our situation. I understand that we should make sure the following situation does not happen: - ieee802154_subif_start_xmit() is called - ieee802154_subif_start_xmit() is called again - ieee802154_tx() get's executed once and stops the queue - ongoing_txs gets incremented once - the first transfer finishes and ongoing_txs gets decremented - the tx queue is supposedly empty by the current series while the second transfer requested earlier has not yet been processed and will definitely be tried in a short moment. I don't find a pretty solution for that. Is your suggestion to use the netdev tx_global_lock? If yes, then, how? Because it does not appear clear to me how we should tackle this issue. In the mean time, I believe the first half of the series is now big enough to be sent aside given the number of additional commits that have popped up following your last review :) Thanks, Miquèl