On 09.04.20 04:59, piliu wrote: > > > On 04/08/2020 10:46 AM, Baoquan He wrote: >> Add Pingfan to CC since he usually handles ppc related bugs for RHEL. >> >> On 04/07/20 at 03:54pm, David Hildenbrand wrote: >>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory >>> blocks as removable"), the user space interface to compute whether a memory >>> block can be offlined (exposed via >>> /sys/devices/system/memory/memoryX/removable) has effectively been >>> deprecated. We want to remove the leftovers of the kernel implementation. >> >> Pingfan, can you have a look at this change on PPC? Please feel free to >> give comments if any concern, or offer ack if it's OK to you. >> >>> >>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()), >>> we'll start by: >>> 1. Testing if it contains any holes, and reject if so >>> 2. Testing if pages belong to different zones, and reject if so >>> 3. Isolating the page range, checking if it contains any unmovable pages >>> >>> Using is_mem_section_removable() before trying to offline is not only racy, >>> it can easily result in false positives/negatives. Let's stop manually >>> checking is_mem_section_removable(), and let device_offline() handle it >>> completely instead. We can remove the racy is_mem_section_removable() >>> implementation next. >>> >>> We now take more locks (e.g., memory hotplug lock when offlining and the >>> zone lock when isolating), but maybe we should optimize that >>> implementation instead if this ever becomes a real problem (after all, >>> memory unplug is already an expensive operation). We started using >>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries: >>> Implement memory hotplug remove in the kernel"), with the initial >>> hotremove support of lmbs. >>> >>> Cc: Nathan Fontenot <nfont@xxxxxxxxxxxxxxxxxx> >>> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> >>> Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> >>> Cc: Paul Mackerras <paulus@xxxxxxxxx> >>> Cc: Michal Hocko <mhocko@xxxxxxxx> >>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>> Cc: Oscar Salvador <osalvador@xxxxxxx> >>> Cc: Baoquan He <bhe@xxxxxxxxxx> >>> Cc: Wei Yang <richard.weiyang@xxxxxxxxx> >>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> >>> --- >>> .../platforms/pseries/hotplug-memory.c | 26 +++---------------- >>> 1 file changed, 3 insertions(+), 23 deletions(-) >>> >>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c >>> index b2cde1732301..5ace2f9a277e 100644 >>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c >>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c >>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np) >>> >>> static bool lmb_is_removable(struct drmem_lmb *lmb) >>> { >>> - int i, scns_per_block; >>> - bool rc = true; >>> - unsigned long pfn, block_sz; >>> - u64 phys_addr; >>> - >>> if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) >>> return false; >>> >>> - block_sz = memory_block_size_bytes(); >>> - scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; >>> - phys_addr = lmb->base_addr; >>> - >>> #ifdef CONFIG_FA_DUMP >>> /* >>> * Don't hot-remove memory that falls in fadump boot memory area >>> * and memory that is reserved for capturing old kernel memory. >>> */ >>> - if (is_fadump_memory_area(phys_addr, block_sz)) >>> + if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes())) >>> return false; >>> #endif >>> - >>> - for (i = 0; i < scns_per_block; i++) { >>> - pfn = PFN_DOWN(phys_addr); >>> - if (!pfn_in_present_section(pfn)) { >>> - phys_addr += MIN_MEMORY_BLOCK_SIZE; >>> - continue; >>> - } >>> - >>> - rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION); >>> - phys_addr += MIN_MEMORY_BLOCK_SIZE; >>> - } >>> - >>> - return rc; >>> + /* device_offline() will determine if we can actually remove this lmb */ >>> + return true; > So I think here swaps the check and do sequence. At least it breaks > dlpar_memory_remove_by_count(). It is doable to remove > is_mem_section_removable(), but here should be more effort to re-arrange > the code. > Thanks Pingfan, 1. "swaps the check and do sequence": Partially. Any caller of dlpar_remove_lmb() already has to deal with false positives. device_offline() can easily fail after dlpar_remove_lmb() == true. It's inherently racy. 2. "breaks dlpar_memory_remove_by_count()" Can you elaborate why it "breaks" it? It will simply try to offline+remove lmbs, detect that it wasn't able to offline+remove as much as it wanted (which could happen before as well easily), and re-add the already offlined+removed ones. 3. "more effort to re-arrange the code" What would be your suggestion? We would rip out that racy check if we can remove as much memory as requested in dlpar_memory_remove_by_count() and simply always try to remove + recover. -- Thanks, David / dhildenb