>>> Really, the interface is flawed and should have never been merged in the >>> first place. We cannot simply remove it altogether I am afraid so let's >>> at least remove the bogus code and pretend that the world is a better >>> place where everything is removable except the reality sucks... >> >> As I expressed already, the interface works as designed/documented and >> has been used like that for years. > > It seems we do differ in the usefulness though. Using a crappy interface > for years doesn't make it less crappy. I do realize we cannot remove the > interface but we can remove issues with the implementation and I dare to > say that most existing users wouldn't really notice. Well, at least powerpc-utils (why this interface was introduced) will notice a) performance wise and b) because more logging output will be generated (obviously non-offlineable blocks will be tried to offline). However, it should not break, because we could have had races before/false positives. > >> I tend to agree that it never should have been merged like that. >> >> We have (at least) two places that are racy (with concurrent memory >> hotplug): >> >> 1. /sys/.../memoryX/removable >> - a) make it always return yes and make the interface useless >> - b) add proper locking and keep it running as is (e.g., so David can >> identify offlineable memory blocks :) ). >> >> 2. /sys/.../memoryX/valid_zones >> - a) always return "none" if the memory is online >> - b) add proper locking and keep it running as is >> - c) cache the result ("zone") when a block is onlined (e.g., in >> mem->zone. If it is NULL, either mixed zones or unknown) >> >> At least 2. already scream for a proper device_lock() locking as the >> mem->state is not stable across the function call. >> >> 1a and 2a are the easiest solutions but remove all ways to identify if a >> memory block could theoretically be offlined - without trying >> (especially, also to identify the MOVABLE zone). >> >> I tend to prefer 1b) and 2c), paired with proper device_lock() locking. >> We don't affect existing use cases but are able to simplify the code + >> fix the races. >> >> What's your opinion? Any alternatives? > > 1a) and 2c) if you ask me. > I'll look into that all, just might take a little (busy with a lot of stuff). But after all, it does not seem to be urgent. 1a) will be easy, I'll post a patch soon that we can let rest in -next for a bit to see if people start to scream out loud. -- Thanks, David / dhildenb