On Thu, May 20, 2021 at 10:48:57AM +0200, Halil Pasic wrote: > On Wed, 19 May 2021 21:08:15 -0400 > Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote: > > > > > > > This is nonesense too: > > > > > > if (vcpu->kvm->arch.crypto.pqap_hook) { > > > if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > > > return -EOPNOTSUPP; > > > ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); > > > > > > It should have a lock around it of some kind, not a > > > try_module_get. module_get is not la lock. > > > > As I said earlier, I don't know why the author did this. > > Please have a look at these links from the archive to get some > perspective: > https://lkml.org/lkml/2020/12/4/994 > https://lkml.org/lkml/2020/12/3/987 > https://www.lkml.org/lkml/2019/3/1/260 > > We can ask the original author, but I don't think we have to. BTW the > patch that introduced it has your r-b. > > > My best guess > > is that he wanted to ensure that the module was still loaded; otherwise, > > the data structures contained therein - for example, the pqap_hook > > and matrix_mdev that contains it - would be gonzo. > > More precisely prevent the module from unloading while we execute code > from it. As I've pointed out in a previous email the module may be gone > by the time we call try_module_get(). No, this is a common misconception. The module_get prevents the module from even being attempted to be unloaded. Code should acquire this if it has done something that would cause a module remove function hang indefinitely, such as a design that waits for a user FD to close. This provides a good user experience but should generally not be required for correctness. All code passing function pointers across subsystems should always fully fence those function pointers during removal. This means it interacts with some kind of locking that guarentees nothing is currently calling, or ever will call again, those function pointers. This is not just to protect the function pointer code itself, but the lock should also protect the data access that function pointer almost always invokes. This is the bug here, ap is accessing the matrix_dev data from a function pointer without any locking or serialization against kfree(matrix_dev). Fencing to guarentee the hook isn't and won't run also serves as a strong enough serialization to allow the kfree(). The basic logic is that a module removal cannot complete until all its function pointers have been removed from everywhere and all the locking that protect those removals are satisified. Jason