> Il giorno 24 mag 2017, alle ore 12:53, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto: > >> >> Il giorno 23 mag 2017, alle ore 21:42, Tejun Heo <tj@xxxxxxxxxx> ha scritto: >> >> Hello, Paolo. >> >> On Sat, May 20, 2017 at 09:27:33AM +0200, Paolo Valente wrote: >>> Consider a process or a group that is moved from a given source group >>> to a different group, or simply removed from a group (although I >>> didn't yet succeed in just removing a process from a group :) ). The >>> pointer to the [b|c]fq_group contained in the schedulable entity >>> belonging to the source group *is not* updated, in BFQ, if the entity >>> is idle, and *is not* updated *unconditionally* in CFQ. The update >>> will happen in bfq_get_rq_private or cfq_set_request, on the arrival >>> of a new request. But, if the move happens right after the arrival of >>> a request, then all the scheduler functions executed until a new >>> request arrives for that entity will see a stale [b|c]fq_group. Much >> >> Limited staleness is fine. Especially in this case, it isn't too >> weird to claim that the order between the two operations isn't clearly >> defined. >> > > ok > >>> worse, if also a blkcg_deactivate_policy or a blkg_destroy are >>> executed right after the move, then both the policy data pointed by >>> the [b|c]fq_group and the [b|c]fq_group itself may be deallocated. >>> So, all the functions of the scheduler invoked before next request >>> arrival may use dangling references! >> >> Hmm... but cfq_group is allocated along with blkcg and blkcg always >> ensures that there are no blkg left before freeing the pd area in >> blkcg_css_offline(). >> > > Exact, but even after all blkgs, as well as the cfq_group and pd, are > gone, the children cfq_queues of the gone cfq_group continue to point > to unexisting objects, until new cfq_set_requests are executed for > those cfq_queues. To try to make this statement clearer, here is the > critical sequence for a cfq_queue, say cfqq, belonging to a cfq_group, > say cfqg: > > 1 cfq_set_request for a request rq of cfqq Sorry, this first event is irrelevant for the problem to occur. What matters is just that some scheduler hooks are invoked *after* the deallocation of a cfq_group, and *before* a new cfq_set_request. Paolo > 2 removal of (the process associated with cfqq) from bfqg > 3 destruction of the blkg that bfqg is associated with > 4 destruction of the blkcg the above blkg belongs to > 5 destruction of the pd pointed to by cfqg, and of cfqg itself > !!!-> from now on cfqq->cfqg is a dangling reference <-!!! > 6 execution of cfq functions, different from cfq_set_request, on cfqq > . cfq_insert, cfq_dispatch, cfq_completed_rq, ... > 7 execution of a new cfq_set_request for cfqq > -> now cfqq->cfqg is again a sane pointer <- > > Every function executed at step 6 sees a dangling reference for > cfqq->cfqg. > > My fix for caching data doesn't solve this more serious problem. > > Where have I been mistaken? > > Thanks, > Paolo > >> Thanks. >> >> -- >> tejun