2015-03-17 17:22 GMT+09:00 Roman Peniaev <r.peniaev@xxxxxxxxx>: > On Tue, Mar 17, 2015 at 4:29 PM, Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote: >> On Tue, Mar 17, 2015 at 02:12:14PM +0900, Roman Peniaev wrote: >>> On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote: >>> > On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote: >>> >> If suitable block can't be found, new block is allocated and put into a head >>> >> of a free list, so on next iteration this new block will be found first. >>> >> >>> >> That's bad, because old blocks in a free list will not get a chance to be fully >>> >> used, thus fragmentation will grow. >>> >> >>> >> Let's consider this simple example: >>> >> >>> >> #1 We have one block in a free list which is partially used, and where only >>> >> one page is free: >>> >> >>> >> HEAD |xxxxxxxxx-| TAIL >>> >> ^ >>> >> free space for 1 page, order 0 >>> >> >>> >> #2 New allocation request of order 1 (2 pages) comes, new block is allocated >>> >> since we do not have free space to complete this request. New block is put >>> >> into a head of a free list: >>> >> >>> >> HEAD |----------|xxxxxxxxx-| TAIL >>> >> >>> >> #3 Two pages were occupied in a new found block: >>> >> >>> >> HEAD |xx--------|xxxxxxxxx-| TAIL >>> >> ^ >>> >> two pages mapped here >>> >> >>> >> #4 New allocation request of order 0 (1 page) comes. Block, which was created >>> >> on #2 step, is located at the beginning of a free list, so it will be found >>> >> first: >>> >> >>> >> HEAD |xxX-------|xxxxxxxxx-| TAIL >>> >> ^ ^ >>> >> page mapped here, but better to use this hole >>> >> >>> >> It is obvious, that it is better to complete request of #4 step using the old >>> >> block, where free space is left, because in other case fragmentation will be >>> >> highly increased. >>> >> >>> >> But fragmentation is not only the case. The most worst thing is that I can >>> >> easily create scenario, when the whole vmalloc space is exhausted by blocks, >>> >> which are not used, but already dirty and have several free pages. >>> >> >>> >> Let's consider this function which execution should be pinned to one CPU: >>> >> >>> >> ------------------------------------------------------------------------------ >>> >> /* Here we consider that our block is equal to 1MB, thus 256 pages */ >>> >> static void exhaust_virtual_space(struct page *pages[256], int iters) >>> >> { >>> >> /* Firstly we have to map a big chunk, e.g. 16 pages. >>> >> * Then we have to occupy the remaining space with smaller >>> >> * chunks, i.e. 8 pages. At the end small hole should remain. >>> >> * So at the end of our allocation sequence block looks like >>> >> * this: >>> >> * XX big chunk >>> >> * |XXxxxxxxx-| x small chunk >>> >> * - hole, which is enough for a small chunk, >>> >> * but not for a big chunk >>> >> */ >>> >> unsigned big_allocs = 1; >>> >> /* -1 for hole, which should be left at the end of each block >>> >> * to keep it partially used, with some free space available */ >>> >> unsigned small_allocs = (256 - 16) / 8 - 1; >>> >> void *vaddrs[big_allocs + small_allocs]; >>> >> >>> >> while (iters--) { >>> >> int i = 0, j; >>> >> >>> >> /* Map big chunk */ >>> >> vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL); >>> >> >>> >> /* Map small chunks */ >>> >> for (j = 0; j < small_allocs; j++) >>> >> vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1, >>> >> PAGE_KERNEL); >>> >> >>> >> /* Unmap everything */ >>> >> while (i--) >>> >> vm_unmap_ram(vaddrs[i], (i ? 8 : 16)); >>> >> } >>> >> } >>> >> ------------------------------------------------------------------------------ >>> >> >>> >> On every iteration new block (1MB of vm area in my case) will be allocated and >>> >> then will be occupied, without attempt to resolve small allocation request >>> >> using previously allocated blocks in a free list. >>> >> >>> >> In current patch I simply put newly allocated block to the tail of a free list, >>> >> thus reduce fragmentation, giving a chance to resolve allocation request using >>> >> older blocks with possible holes left. >>> > >>> > Hello, >>> > >>> > I think that if you put newly allocated block to the tail of a free >>> > list, below example would results in enormous performance degradation. >>> > >>> > new block: 1MB (256 pages) >>> > >>> > while (iters--) { >>> > vm_map_ram(3 or something else not dividable for 256) * 85 >>> > vm_unmap_ram(3) * 85 >>> > } >>> > >>> > On every iteration, it needs newly allocated block and it is put to the >>> > tail of a free list so finding it consumes large amount of time. >>> > >>> > Is there any other solution to prevent your problem? >>> >>> Hello. >>> >>> My second patch fixes this problem. >>> I occupy the block on allocation and avoid jumping to the search loop. >> >> I'm not sure that this fixes above case. >> 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls. >> >> First vm_map_ram(3) caller could get benefit from your second patch. >> But, second caller and the other callers in each iteration could not >> get benefit and should iterate whole list to find suitable free block, >> because this free block is put to the tail of the list. Am I missing >> something? > > You are missing the fact that we occupy blocks in 2^n. > So in your example 4 page slots will be occupied (order is 2), not 3. Ah... Okay. I understand now. > The maximum size of allocation is 32 pages for 32-bit system > (if you try to map more, original alloc_vmap_area will be called). > > So the maximum order is 5. That means that worst case, before we make > the decision > to allocate new block, is to iterate 6 blocks: > > HEAD > 1st block - has 1 page slot free (order 0) > 2nd block - has 2 page slots free (order 1) > 3rd block - has 4 page slots free (order 2) > 4th block - has 8 page slots free (order 3) > 5th block - has 16 page slots free (order 4) > 6th block - has 32 page slots free (order 5) > TAIL > > So the worst scenario is that each CPU queue can have 6 blocks in a free list. Okay. > This can happen only and only if you allocate blocks increasing the order. > (as I did in the function written in the comment of the first patch) > This is weird and rare case, but still it is possible. > Afterwards you will get 6 blocks in a list. > > All further requests should be placed in a newly allocated block or > some free slots > should be found in a free list. Seems it does not look dramatically awful. I think so, too. Acked-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Thanks. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html