On 05/19/23 at 01:22pm, Thomas Gleixner wrote: > On Wed, May 17 2023 at 18:52, Baoquan He wrote: > > On 05/17/23 at 11:38am, Thomas Gleixner wrote: > >> On Tue, May 16 2023 at 21:03, Thomas Gleixner wrote: > >> > > >> > Aside of that, if I read the code correctly then if there is an unmap > >> > via vb_free() which does not cover the whole vmap block then vb->dirty > >> > is set and every _vm_unmap_aliases() invocation flushes that dirty range > >> > over and over until that vmap block is completely freed, no? > >> > >> Something like the below would cure that. > >> > >> While it prevents that this is flushed forever it does not cure the > >> eventually overly broad flush when the block is completely dirty and > >> purged: > >> > >> Assume a block with 1024 pages, where 1022 pages are already freed and > >> TLB flushed. Now the last 2 pages are freed and the block is purged, > >> which results in a flush of 1024 pages where 1022 are already done, > >> right? > > > > This is good idea, I am thinking how to reply to your last mail and how > > to fix this. While your cure code may not work well. Please see below > > inline comment. > > See below. > > > One vmap block has 64 pages. > > #define VMAP_MAX_ALLOC BITS_PER_LONG /* 256K with 4K pages */ > > No, VMAP_MAX_ALLOC is the allocation limit for a single vb_alloc(). > > On 64bit it has at least 128 pages, but can have up to 1024: > > #define VMAP_BBMAP_BITS_MAX 1024 /* 4MB with 4K pages */ > #define VMAP_BBMAP_BITS_MIN (VMAP_MAX_ALLOC*2) > > and then some magic happens to calculate the actual size > > #define VMAP_BBMAP_BITS \ > VMAP_MIN(VMAP_BBMAP_BITS_MAX, \ > VMAP_MAX(VMAP_BBMAP_BITS_MIN, \ > VMALLOC_PAGES / roundup_pow_of_two(NR_CPUS) / 16)) > > which is in a range of (2*BITS_PER_LONG) ... 1024. > > The actual vmap block size is: > > #define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE) You are right, it's 1024. I was dizzy at that time. > > Which is then obviously something between 512k and 4MB on 64bit and > between 256k and 4MB on 32bit. > > >> @@ -2240,13 +2240,17 @@ static void _vm_unmap_aliases(unsigned l > >> rcu_read_lock(); > >> list_for_each_entry_rcu(vb, &vbq->free, free_list) { > >> spin_lock(&vb->lock); > >> - if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > >> + if (vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) { > >> unsigned long va_start = vb->va->va_start; > >> unsigned long s, e; > > > > When vb_free() is invoked, it could cause three kinds of vmap_block as > > below. Your code works well for the 2nd case, for the 1st one, it may be > > not. And the 2nd one is the stuff that we reclaim and put into purge > > list in purge_fragmented_blocks_allcpus(). > > > > 1) > > |-----|------------|-----------|-------| > > |dirty|still mapped| dirty | free | > > > > 2) > > |------------------------------|-------| > > | dirty | free | > > > You sure? The first one is put into the purge list too. No way. You don't copy the essential code here. The key line is calculation of vb->dirty. ->dirty_min and ->dirty_max only provides a loose vlaue for calculating the flush range. Counting more or less page of ->dirty_min or ->dirty_max won't impact much, just make flush do some menningless operation. While counting ->dirty wrong will cause serious problem. If you put case 1 into purge list, freeing it later will fail because you can't find it in vmap_area_root tree. Please check vfree() and remove_vm_area(). /* Expand dirty range */ vb->dirty_min = min(vb->dirty_min, offset); vb->dirty_max = max(vb->dirty_max, offset + (1UL << order)); vb->dirty += 1UL << order; Plesae note the judgement of the 2nd case as below: Means there's only free and dirty, and diryt doesn't reach VMAP_BBMAP_BITS, it's case 2. (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) By the way, I made a RFC patchset based on your patch, and your earlier mail in which you raised some questions. I will add it here, please help check if it's worth posting for discussing and reviewing. > > /* Expand dirty range */ > vb->dirty_min = min(vb->dirty_min, offset); > vb->dirty_max = max(vb->dirty_max, offset + (1UL << order)); > > pages bits dirtymin dirtymax > vb_alloc(A) 2 0 - 1 VMAP_BBMAP_BITS 0 > vb_alloc(B) 4 2 - 5 > vb_alloc(C) 2 6 - 7 > > So you get three variants: > > 1) Flush after freeing A > > vb_free(A) 2 0 - 1 0 1 > Flush VMAP_BBMAP_BITS 0 <- correct > vb_free(C) 2 6 - 7 6 7 > Flush VMAP_BBMAP_BITS 0 <- correct > > > 2) No flush between freeing A and C > > vb_free(A) 2 0 - 1 0 1 > vb_free(C) 2 6 - 7 0 7 > Flush VMAP_BBMAP_BITS 0 <- overbroad flush > > > 3) No flush between freeing A, C, B > > vb_free(A) 2 0 - 1 0 1 > vb_free(C) 2 6 - 7 0 7 > vb_free(C) 2 2 - 5 0 7 > Flush VMAP_BBMAP_BITS 0 <- correct > > So my quick hack makes it correct for #1 and #3 and prevents repeated > flushes of already flushed areas. > > To prevent #2 you need a bitmap which keeps track of the flushed areas. I made a draft patchset based on your earlier mail, > > Thanks, > > tglx >