On Tue, Sep 17, 2019 at 10:06:11AM +0530, Anshuman Khandual wrote: > On 09/13/2019 03:39 PM, Catalin Marinas wrote: > > On Fri, Sep 13, 2019 at 11:28:01AM +0530, Anshuman Khandual wrote: > >> The problem (race) is not because of the inability to deal with partially > >> filled table. We can handle that correctly as explained below [1]. The > >> problem is with inadequate kernel page table locking during vmalloc() > >> which might be accessing intermediate kernel page table pointers which is > >> being freed with free_empty_tables() concurrently. Hence we cannot free > >> any page table page which can ever have entries from vmalloc() range. > > > > The way you deal with the partially filled table in this patch is to > > avoid freeing if there is a non-empty entry (!p*d_none()). This is what > > causes the race with vmalloc. If you simply avoid freeing a pmd page, > > for example, if the range floor/ceiling is not aligned to PUD_SIZE, > > irrespective of whether the other entries are empty or not, you > > shouldn't have this problem. You do free the pte page if the range is [...] > > We may have some pgtable pages not freed at both ends of the range > > (maximum 6 in total) but I don't really see this an issue. They could be > > reused if something else gets mapped in that range. > > I assume that the number 6 for maximum page possibility came from > > (floor edge + ceiling edge) * (PTE table + PMD table + PUD table) Yes. > >> Though not completely sure, whether I really understood the suggestion above > >> with respect to the floor-ceiling mechanism as in free_pgd_range(). Are you > >> suggesting that we should only attempt to free up those vmemmap range page > >> table pages which *definitely* could never overlap with vmalloc by working > >> on a modified (i.e cut down with floor-ceiling while avoiding vmalloc range > >> at each level) vmemmap range instead ? > > > > You can ignore the overlap check altogether, only free the page tables > > with floor/ceiling set to the start/size passed to arch_remove_memory() > > and vmemmap_free(). > > Wondering if it will be better to use [VMEMMAP_START - VMEMMAP_END] and > [PAGE_OFFSET - PAGE_END] as floor/ceiling respectively with vmemmap_free() > and arch_remove_memory(). Not only it is safe to free all page table pages > which span over these maximum possible mapping range but also it reduces > the risk for alignment related wastage. That's indeed better. You pass the floor/ceiling as the enclosing range and start/end as the actual range to unmap is. We avoid the potential "leak" around the edges when falling within the floor/ceiling range (I think that's close to what free_pgd_range() does). > >> This can be one restrictive version of the function > >> free_empty_tables() called in case there is an overlap. So we will > >> maintain two versions for free_empty_tables(). Please correct me if > >> any the above assumptions or understanding is wrong. > > > > I'd rather have a single version of free_empty_tables(). As I said > > above, the only downside is that a partially filled pgtable page would > > not be freed even though the other entries are empty. > > Sure. Also practically the limitation will be applicable only for vmemmap > mapping but not for linear mappings where the chances of overlap might be > negligible as it covers half kernel virtual address space. If you have a common set of functions, it doesn't heart to pass the correct floor/ceiling in both cases. -- Catalin