> Il giorno 23 mag 2017, alle ore 21:42, Tejun Heo <tj@xxxxxxxxxx> ha scritto: > > Hello, Paolo. > > On Sat, May 20, 2017 at 09:27:33AM +0200, Paolo Valente wrote: >> Consider a process or a group that is moved from a given source group >> to a different group, or simply removed from a group (although I >> didn't yet succeed in just removing a process from a group :) ). The >> pointer to the [b|c]fq_group contained in the schedulable entity >> belonging to the source group *is not* updated, in BFQ, if the entity >> is idle, and *is not* updated *unconditionally* in CFQ. The update >> will happen in bfq_get_rq_private or cfq_set_request, on the arrival >> of a new request. But, if the move happens right after the arrival of >> a request, then all the scheduler functions executed until a new >> request arrives for that entity will see a stale [b|c]fq_group. Much > > Limited staleness is fine. Especially in this case, it isn't too > weird to claim that the order between the two operations isn't clearly > defined. > ok >> worse, if also a blkcg_deactivate_policy or a blkg_destroy are >> executed right after the move, then both the policy data pointed by >> the [b|c]fq_group and the [b|c]fq_group itself may be deallocated. >> So, all the functions of the scheduler invoked before next request >> arrival may use dangling references! > > Hmm... but cfq_group is allocated along with blkcg and blkcg always > ensures that there are no blkg left before freeing the pd area in > blkcg_css_offline(). > Exact, but even after all blkgs, as well as the cfq_group and pd, are gone, the children cfq_queues of the gone cfq_group continue to point to unexisting objects, until new cfq_set_requests are executed for those cfq_queues. To try to make this statement clearer, here is the critical sequence for a cfq_queue, say cfqq, belonging to a cfq_group, say cfqg: 1 cfq_set_request for a request rq of cfqq 2 removal of (the process associated with cfqq) from bfqg 3 destruction of the blkg that bfqg is associated with 4 destruction of the blkcg the above blkg belongs to 5 destruction of the pd pointed to by cfqg, and of cfqg itself !!!-> from now on cfqq->cfqg is a dangling reference <-!!! 6 execution of cfq functions, different from cfq_set_request, on cfqq . cfq_insert, cfq_dispatch, cfq_completed_rq, ... 7 execution of a new cfq_set_request for cfqq -> now cfqq->cfqg is again a sane pointer <- Every function executed at step 6 sees a dangling reference for cfqq->cfqg. My fix for caching data doesn't solve this more serious problem. Where have I been mistaken? Thanks, Paolo > Thanks. > > -- > tejun