On Thu, Apr 08, 2021 at 10:55:53AM +0200, Greg KH wrote: > So to add crazy complexity to the kernel, I agree that this can be tricky. However, driver developers are going to open code this either way. The problem with that as well, and one of my own reasons for striving for at least *trying* for a generic solution, was that I am aware that driver developers may be trying a busy solution rather than the try method. The busy approach means you could also end up in a situation where userspace can prevent full undoing / removal of a driver. The try method is best effort in the sense that if the driver is going it won't be allowed. So a sensible consideration I think is we at least choose one of these: a) We allow driver developers to open code it on drivers which need it on each and every single sysfs knob on the driver where its needed, and accept folks might do it wrong b) Provide a macro which is opt-in, ie, not doing it for all attributes, but perhaps one which the driver author *is* aware to try / put of the driver method. c) Document this race and other races / failings so that driver authors who do care about module removal are aware and know what to do. In this thread two races were exposed on syfs: * deadlock when a sysfs attribute uses a lock which is also used on module __exit * possible race against the device's private data, and this is type specific. I think many people probably missed the last hunks of my proposed patch which added dev_type_get() which were not part of the deadlock fix. At least I showed how attributes for all block devices have an exposed race, which can crash if removal of a block device with del_gendisk() happens while a sysfs attribute is about to be used. I don't think either race is trivial for a driver developer to assume a solution for. Most focus on this thread was about the first race, the seconod however is also very real, and likely can cause more issues on rolling backs on error codes unrelated to rmmod... > for an operation that can only > be triggered manually by a root user, is not worth it in my opinion, as > the maintainer of that code the complexity was asked to be made to. Three things: 1) Many driver maintainers *do* care that rmmod works well. To the point that if it does not, they feel ashamed. Reason for this is that in some subsystems this *is* a typical test case. So although rmmod may be a vodoo thing for core, many driver developers do want this to work well. As someone who also works on many test cases to expose odd issues in the kernel unrelated to module removal, I can also say that module removal does also expose other possible races which would otherwise be difficult to trigger. So it is also a helfup aid as a developer working on differen areas of the kernel. 2) Folks have pointed out that this is not just about rmmod, rolling back removal of sysfs attributes due to error code paths is another real scenario to consider. I don't think we have rules to tell userspace to not muck with sysfs files after they are exposed. In fact those uevents we send to userspace allow userspace to know exactly when to muck with them. 3) Sadly, some sysfs attributes also purposely do things which *also* mimic what is typically done on module removal, such as removal of an interface, or block device. That was the case with zram, it allows remvoal / reset of a device... Yes these are odd things to do, but we allow for it. And sysfs users *do* need to be aware of these corner cases if they want to support them. There **may** be some severity to some of the issues mentioned above, to allow really bad things to be done in userspace even without module removal... but I didn't have time yet to expose a pattern with coccinelle yet to see how commonplace some of these issue are. I was focusing at first more for a generic solution if possible, as I thought that would be better first evaluated, and to let others slowly become aware of the issue. Luis