Re: [PATCH wpan-next v2 09/11] net: mac802154: Introduce a synchronous API for MLME commands

Alexander Aring <aahringo@xxxxxxxxxx> · Wed, 18 May 2022 21:51:36 -0400

Hi,

On Wed, May 18, 2022 at 12:12 PM Miquel Raynal
<miquel.raynal@xxxxxxxxxxx> wrote:
>
> Hi Alexander,
>
> > > > > > > > > > +int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > > > > > > > > > +{
> > > > > > > > > > +       int ret;
> > > > > > > > > > +
> > > > > > > > > > +       /* Avoid possible calls to ->ndo_stop() when we asynchronously perform
> > > > > > > > > > +        * MLME transmissions.
> > > > > > > > > > +        */
> > > > > > > > > > +       rtnl_lock();
> > > > > > > > >
> > > > > > > > > I think we should make an ASSERT_RTNL() here, the lock needs to be
> > > > > > > > > earlier than that over the whole MLME op. MLME can trigger more than
> > > > > > > >
> > > > > > > > not over the whole MLME_op, that's terrible to hold the rtnl lock so
> > > > > > > > long... so I think this is fine that some netdev call will interfere
> > > > > > > > with this transmission.
> > > > > > > > So forget about the ASSERT_RTNL() here, it's fine (I hope).
> > > > > > > >
> > > > > > > > > one message, the whole sync_hold/release queue should be earlier than
> > > > > > > > > that... in my opinion is it not right to allow other messages so far
> > > > > > > > > an MLME op is going on? I am not sure what the standard says to this,
> > > > > > > > > but I think it should be stopped the whole time? All those sequence
> > > > > > > >
> > > > > > > > Whereas the stop of the netdev queue makes sense for the whole mlme-op
> > > > > > > > (in my opinion).
> > > > > > >
> > > > > > > I might still implement an MLME pre/post helper and do the queue
> > > > > > > hold/release calls there, while only taking the rtnl from the _tx.
> > > > > > >
> > > > > > > And I might create an mlme_tx_one() which does the pre/post calls as
> > > > > > > well.
> > > > > > >
> > > > > > > Would something like this fit?
> > > > > >
> > > > > > I think so, I've heard for some transceiver types a scan operation can
> > > > > > take hours... but I guess whoever triggers that scan in such an
> > > > > > environment knows that it has some "side-effects"...
> > > > >
> > > > > Yeah, a scan requires the data queue to be stopped and all incoming
> > > > > packets to be dropped (others than beacons, ofc), so users must be
> > > > > aware of this limitation.
> > > >
> > > > I think there is a real problem about how the user can synchronize the
> > > > start of a scan and be sure that at this point everything was
> > > > transmitted, we might need to real "flush" the queue. Your naming
> > > > "flush" is also wrong, It will flush the framebuffer(s) of the
> > > > transceivers but not the netdev queue... and we probably should flush
> > > > the netdev queue before starting mlme-op... this is something to add
> > > > in the mlme_op_pre() function.
> > >
> > > Is it even possible? This requires waiting for the netdev queue to be
> > > empty before stopping it, but if users constantly flood the transceiver
> > > with data packets this might "never" happen.
> > >
> >
> > Nothing is impossible, just maybe nobody thought about that. Sure
> > putting more into the queue should be forbidden but what's inside
> > should be "flushed". Currently we make a hard cut, there is no way
> > that the user knows what's sent or not BUT that is the case for
> > xmit_do() anyway, it's not reliable... people need to have the right
> > upper layer protocol. However I think we could run into problems if we
> > especially have features like waiting for the socket error queue to
> > know if e.g. an ack was received or not.
>
> Looking at net/core/dev.c I don't see the issue anymore, let me try to
> explain: as far as I understand the net device queue is a very
> conceptual "queue" which only has a reality if the underlying layer
> really implements the concept of a queue. To be more precise, at the
> netdev level itself, there is a HARD_TX_LOCK() call which serializes
> the ->ndo_start_xmit() calls, but whatever entered the
> ->ndo_start_xmit() hook _will_ be handled by the lower layer and is not
> in any "waiting" state at the net core level.
>
> In practice, the IEEE 802.15.4 core treats all packets immediately and
> do not really bother "queuing" them like if there was a "waiting"
> state. So all messages that the userspace expected to be send (which
> did not return NETDEV_TX_BUSY) at the moment where we decide to stop
> data transmissions will be processed.
>
> If several frames had to be transmitted to the IEEE 802.15.4 core and
> they all passed the netdev "queuing" mechanism, then they will be
> forwarded to the tranceivers thanks to the wait_event(!ongoing_txs) and
> only after we declare the queue sync'ed.
>
> For me there is no hard cut.

In my opinion there is definitely in case of a wpan interface a queue
handling right above xmit_do() which is in a "works for now" state.
Your queue flush function will not flush any queue, as I said it's
flushing the transceivers framebuffer at the starting point of
xmit_do() call and you should change your comments/function names to
describe this behaviour.

- Alex