On Mon, 2017-05-15 at 10:16 +0200, Martin Wilck wrote: > On Fri, 2017-05-12 at 16:24 +0000, Bart Van Assche wrote: > > Allowing races like the one this patch tries to address to exist > > makes the ALUA code harder to maintain than necessary. Have you > > considered to make alua_bus_detach() wait until ALUA work has > > finished by using e.g. cancel_work_sync() or rcu_synchronize()? > > to be honest, no, I didn't consider this yet. The current kernel > crashes with BUG() if an ALUA device is detached at an inopportune > point in time (not just theoretically, we actually observed this). The > goal of my patch was to fix this with minimum risk to introduce other > problems. The addition in patch 4/4 was an attempt to address the > concern you had expressed in your review of the v1 patch. > > I'm not opposed to try to find a better solution, but could we maybe > get the fix for the BUG() (i.e. patch 3/4) applied in the first place? > AFAICS it would not conflict with a solution like the one you > suggested. Hello Martin, Sorry but I don't think it's a good idea to merge patch 3/4 in the upstream kernel. Even with that patch applied there is nothing that prevents that h->handler_data would be freed while alua_rtpg() is in progress and hence that h->sdev is a completely random pointer if alua_rtpg() is executed concurrently with alua_bus_detach(). Please do not try to paper over race conditions but fix these properly. Bart.