Hi, On Wed, May 18, 2022 at 12:12 PM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > Hi Alexander, > > > > > > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb) > > > > > > > > > > +{ > > > > > > > > > > + int ret; > > > > > > > > > > + > > > > > > > > > > + /* Avoid possible calls to ->ndo_stop() when we asynchronously perform > > > > > > > > > > + * MLME transmissions. > > > > > > > > > > + */ > > > > > > > > > > + rtnl_lock(); > > > > > > > > > > > > > > > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be > > > > > > > > > earlier than that over the whole MLME op. MLME can trigger more than > > > > > > > > > > > > > > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so > > > > > > > > long... so I think this is fine that some netdev call will interfere > > > > > > > > with this transmission. > > > > > > > > So forget about the ASSERT_RTNL() here, it's fine (I hope). > > > > > > > > > > > > > > > > > one message, the whole sync_hold/release queue should be earlier than > > > > > > > > > that... in my opinion is it not right to allow other messages so far > > > > > > > > > an MLME op is going on? I am not sure what the standard says to this, > > > > > > > > > but I think it should be stopped the whole time? All those sequence > > > > > > > > > > > > > > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op > > > > > > > > (in my opinion). > > > > > > > > > > > > > > I might still implement an MLME pre/post helper and do the queue > > > > > > > hold/release calls there, while only taking the rtnl from the _tx. > > > > > > > > > > > > > > And I might create an mlme_tx_one() which does the pre/post calls as > > > > > > > well. > > > > > > > > > > > > > > Would something like this fit? > > > > > > > > > > > > I think so, I've heard for some transceiver types a scan operation can > > > > > > take hours... but I guess whoever triggers that scan in such an > > > > > > environment knows that it has some "side-effects"... > > > > > > > > > > Yeah, a scan requires the data queue to be stopped and all incoming > > > > > packets to be dropped (others than beacons, ofc), so users must be > > > > > aware of this limitation. > > > > > > > > I think there is a real problem about how the user can synchronize the > > > > start of a scan and be sure that at this point everything was > > > > transmitted, we might need to real "flush" the queue. Your naming > > > > "flush" is also wrong, It will flush the framebuffer(s) of the > > > > transceivers but not the netdev queue... and we probably should flush > > > > the netdev queue before starting mlme-op... this is something to add > > > > in the mlme_op_pre() function. > > > > > > Is it even possible? This requires waiting for the netdev queue to be > > > empty before stopping it, but if users constantly flood the transceiver > > > with data packets this might "never" happen. > > > > > > > Nothing is impossible, just maybe nobody thought about that. Sure > > putting more into the queue should be forbidden but what's inside > > should be "flushed". Currently we make a hard cut, there is no way > > that the user knows what's sent or not BUT that is the case for > > xmit_do() anyway, it's not reliable... people need to have the right > > upper layer protocol. However I think we could run into problems if we > > especially have features like waiting for the socket error queue to > > know if e.g. an ack was received or not. > > Looking at net/core/dev.c I don't see the issue anymore, let me try to > explain: as far as I understand the net device queue is a very > conceptual "queue" which only has a reality if the underlying layer > really implements the concept of a queue. To be more precise, at the > netdev level itself, there is a HARD_TX_LOCK() call which serializes > the ->ndo_start_xmit() calls, but whatever entered the > ->ndo_start_xmit() hook _will_ be handled by the lower layer and is not > in any "waiting" state at the net core level. > > In practice, the IEEE 802.15.4 core treats all packets immediately and > do not really bother "queuing" them like if there was a "waiting" > state. So all messages that the userspace expected to be send (which > did not return NETDEV_TX_BUSY) at the moment where we decide to stop > data transmissions will be processed. > > If several frames had to be transmitted to the IEEE 802.15.4 core and > they all passed the netdev "queuing" mechanism, then they will be > forwarded to the tranceivers thanks to the wait_event(!ongoing_txs) and > only after we declare the queue sync'ed. > > For me there is no hard cut. In my opinion there is definitely in case of a wpan interface a queue handling right above xmit_do() which is in a "works for now" state. Your queue flush function will not flush any queue, as I said it's flushing the transceivers framebuffer at the starting point of xmit_do() call and you should change your comments/function names to describe this behaviour. - Alex