On Mon 20-01-20 10:14:44, David Hildenbrand wrote: > On 20.01.20 08:48, Michal Hocko wrote: > > On Fri 17-01-20 08:57:51, Dan Williams wrote: > > [...] > >> Unless the user is willing to hold the device_hotplug_lock over the > >> evaluation then the result is unreliable. > > > > Do we want to hold the device_hotplug_lock from this user readable file > > in the first place? My book says that this just waits to become a > > problem. > > It was the "big hammer" solution for this RFC. > > I think we could do with a try_lock() on the device_lock() paired with a > device->removed flag. The latter is helpful for properly catching zombie > devices on the onlining/offlining path either way (and on my todo list). try_lock would be more considerate. It would at least make any potential hammering a bit harder. > > Really, the interface is flawed and should have never been merged in the > > first place. We cannot simply remove it altogether I am afraid so let's > > at least remove the bogus code and pretend that the world is a better > > place where everything is removable except the reality sucks... > > As I expressed already, the interface works as designed/documented and > has been used like that for years. It seems we do differ in the usefulness though. Using a crappy interface for years doesn't make it less crappy. I do realize we cannot remove the interface but we can remove issues with the implementation and I dare to say that most existing users wouldn't really notice. > I tend to agree that it never should have been merged like that. > > We have (at least) two places that are racy (with concurrent memory > hotplug): > > 1. /sys/.../memoryX/removable > - a) make it always return yes and make the interface useless > - b) add proper locking and keep it running as is (e.g., so David can > identify offlineable memory blocks :) ). > > 2. /sys/.../memoryX/valid_zones > - a) always return "none" if the memory is online > - b) add proper locking and keep it running as is > - c) cache the result ("zone") when a block is onlined (e.g., in > mem->zone. If it is NULL, either mixed zones or unknown) > > At least 2. already scream for a proper device_lock() locking as the > mem->state is not stable across the function call. > > 1a and 2a are the easiest solutions but remove all ways to identify if a > memory block could theoretically be offlined - without trying > (especially, also to identify the MOVABLE zone). > > I tend to prefer 1b) and 2c), paired with proper device_lock() locking. > We don't affect existing use cases but are able to simplify the code + > fix the races. > > What's your opinion? Any alternatives? 1a) and 2c) if you ask me. -- Michal Hocko SUSE Labs