> Il giorno 24 mag 2017, alle ore 17:47, Tejun Heo <tj@xxxxxxxxxx> ha scritto: > > Hello, > > On Wed, May 24, 2017 at 05:43:18PM +0100, Paolo Valente wrote: >>> so none of the above objects can be destroyed before the request is >>> done. >> >> ... the issue seems just to move to a more subtle position: cfq is ok, >> because it protects itself with rq lock, but blk-mq schedulers don't. >> So, the race that leads to the (real) crashes reported by people may >> actually be: > > Oh, I was just thinking about !mq paths the whole time. > >> 1 blkg_lookup executed on a blkg being destroyed: the scheduler gets a >> copy of the content of the blkg, but the rcu mechanism doesn't prevent >> destruction from going on >> 2 blkg_get gets executed on the copy of the original blkg > > So, we can't do that. We should look up and bump the ref and use the > original copy. We probably should switch blkgs to use percpu-refs. > Ok. So, just to better understand: as of now, i.e., before you make the changes you are proposing, the address returned by blkg_lookup can be used safely only if one both invokes blkg_lookup and gets a reference, while holding the rq lock all the time. But then, before the changes you propose, what is the remaining role of rcu protection here? Are there places where the value returned by blkg_lookup is actually used safely without getting a reference the returned blkg? Anyway, I'm willing to help with your proposal, if you think I can be of any help at some point. In this respect, consider that I'm not an expert of percpu-refs either. Finally, I guess that the general fix you have in mind may not be ready shortly. So, I'll proceed with my temporary fix for the moment. In particular, I will 1) fix the typo reported by Jens; 2) add a note stating that this is a temporary fix; 3) if needed, modify commit log and comments in the diffs, to better describe the general problem, and the actual critical race. Thanks, Paolo > Thanks. > > -- > tejun