Re: [PATCH BlueZ v3 5/5] mesh: Add "node is busy" check for Leave() & Attach()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michał, Inga,


On Mon, 2020-06-15 at 22:25 +0000, Stotland, Inga wrote:
> Hi Michal,
> 
> On Mon, 2020-06-15 at 22:32 +0200, michal.lowas-rzechonek@xxxxxxxxxxx
> wrote:
> > Inga, Brian,
> > 
> > On 06/15, Gix, Brian wrote:
> > > > With this patch, this call sequence fails, because now we're supposed to
> > > > send a *reply* to JoinComplete first, and only then call Attach()?
> > > 
> > > A couple months ago, we made the decision (with your input, I believe)
> > > that we needed to use JoinComplete on every node creation (Join,
> > > Import, Create), to ensure that the App has the token (...)
> > 
> > Yes, I remember that discussion. The rationale was ability to catch
> > bugs in the application, and get rid of created, but effectively
> > unusable nodes.
> > 
> > > This creates (...) the unfortunate situation revealed in one of your
> > > test-suite cases where if Leave() was called before returning the
> > > JoinComplete() call, we Seg-Faulted.
> > 
> > Indeed, although I don't think it's necessary to introduce a "busy"
> > state...
> > 
> > In case of Leave(), this call removes the node anyway, so what I think
> > would suffice is to check whether the node still exists when
> > JoinComplete reply arrives, to avoid freeing the node twice (causing
> > SEGV).
> > 
> > void node_finalize_new_node(struct mesh_node *node, struct mesh_io *io)
> > {
> > 	if (!node)
> > 		return;
> > 
> > 	if (!l_queue_find(nodes, match_simple, L_UINT_TO_PTR(node)))
> > 		return;
> > 
> 
> I am afraid that this kind of check may lead to a race condition (rare, but possible) when:
>  - a un-finalized node has been removed via Leave and
>  - the daemon still waiting for JoinCOmplete() reply and
>  - meanwhile another one has either been created or imported reusing the memory allocation (entirely
> possible) and has been attached
>  
> So when the original JoinComplete returns either via timeout/error/ok,we may unintentionally remove dbus
> resources of a new node that has been validated and attached.
> 
> 
> >     // ...
> > }
> > 
> > This would allow the application to call Leave *before* sending a reply
> > to JoinComplete.
> > 
> > As for Attach(), I also think it should be legal to call it before
> > replying to JoinComplete. The worst thing that can happen is that
> > application successfully attaches, then replies to JoinComplete with an
> > error. This would simply remove the node, and the application would be
> > promptly detached.
> > 
> > 
> 
> I guess we could introduce an internal timer inside the daemon to put
> Attach on hold until JoinComplete is done. If JoinComplete returns an
> error, then Attach won'r go through and would return error as well

So I have created a patch with this option, that I have sent to the reflector.

(See: [PATCH BlueZ] mesh: Add deferal of Attach() and Leave() if busy )

It maintains the *busy* check to make sure we are in a safe state to perform the Attach or Leave, and if not we
defer the call for 1 second...  With the understanding that *nobody* should be delaying the response to
JoinComplete.  This handles just the simple "Out Of Expected Order" issue, but will still respond to Attach and
Leave with the Busy error if when we reconsider the call, the node has still not been completed.

I have tested this with no memory leaks, and verified that either Attach or Leave can be called within the
JoinComplete application handler.


> 
> > > > I'm using a high-level API for D-Bus, so I don't really control when the
> > > > reply is sent, so at the moment the only way to implement this would be
> > > > by delaying Attach() by a fixed timeout - and I'm not comfortable with
> > > > that.
> > > 
> > > Yeah, I can see how this is now required...  
> > > 
> > > In the mesh-cfgclient tool (which is also built on ELL) we accomplish
> > > this by scheduling an idle_oneshot for the Attach.  
> > 
> > Unfortunately, not all main loops have API to schedule "idle" calls,
> > i.e. calls executed when the loop doesn't have anything better to do.
> > 
> > I know that both ELL and Glib do, but AFAIR Qt does not (it uses timers
> > with timeout set to 0, if I'm not mistaken), and Python's asyncio
> > doesn't either.
> > 
> > I don't think requiring a specific sequence of dbus messages is a good
> > idea :(
> > 
> > regards




[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux