Hi Andrew, On 01/10/2013 07:11 AM, Andrew Morton wrote:
On Wed, 9 Jan 2013 17:32:26 +0800 Tang Chen<tangchen@xxxxxxxxxxxxxx> wrote:We remove the memory like this: 1. lock memory hotplug 2. offline a memory block 3. unlock memory hotplug 4. repeat 1-3 to offline all memory blocks 5. lock memory hotplug 6. remove memory(TODO) 7. unlock memory hotplug All memory blocks must be offlined before removing memory. But we don't hold the lock in the whole operation. So we should check whether all memory blocks are offlined before step6. Otherwise, kernel maybe panicked.Well, the obvious question is: why don't we hold lock_memory_hotplug() for all of steps 1-4? Please send the reasons for this in a form which I can paste into the changelog.
In the changelog form: Offlining a memory block and removing a memory device can be two different operations. Users can just offline some memory blocks without removing the memory device. For this purpose, the kernel has held lock_memory_hotplug() in __offline_pages(). To reuse the code for memory hot-remove, we repeat step 1-3 to offline all the memory blocks, repeatedly lock and unlock memory hotplug, but not hold the memory hotplug lock in the whole operation.
Actually, I wonder if doing this would fix a race in the current remove_memory() repeat: loop. That code does a find_memory_block_hinted() followed by offline_memory_block(), but afaict find_memory_block_hinted() only does a get_device(). Is the get_device() sufficiently strong to prevent problems if another thread concurrently offlines or otherwise alters this memory_block's state?
I think we already have memory_block->state_mutex to protect the concurrently changing of memory_block's state. The find_memory_block_hinted() here is to find the memory_block corresponding to the memory section we are dealing with. Thanks. :)
-- To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |