Uladzislau Rezki <urezki@xxxxxxxxx> writes: > Hello, Daniel > >> >> @@ -1294,14 +1299,19 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) >> spin_lock(&free_vmap_area_lock); >> llist_for_each_entry_safe(va, n_va, valist, purge_list) { >> unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; >> + unsigned long orig_start = va->va_start; >> + unsigned long orig_end = va->va_end; >> >> /* >> * Finally insert or merge lazily-freed area. It is >> * detached and there is no need to "unlink" it from >> * anything. >> */ >> - merge_or_add_vmap_area(va, >> - &free_vmap_area_root, &free_vmap_area_list); >> + va = merge_or_add_vmap_area(va, &free_vmap_area_root, >> + &free_vmap_area_list); >> + >> + kasan_release_vmalloc(orig_start, orig_end, >> + va->va_start, va->va_end); >> > I have some questions here. I have not analyzed kasan_releace_vmalloc() > logic in detail, sorry for that if i miss something. __purge_vmap_area_lazy() > deals with big address space, so not only vmalloc addresses it frees here, > basically it can be any, starting from 1 until ULONG_MAX, whereas vmalloc > space spans from VMALLOC_START - VMALLOC_END: > > 1) Should it be checked that vmalloc only address is freed or you handle > it somewhere else? > > if (is_vmalloc_addr(va->va_start)) > kasan_release_vmalloc(...) So in kasan_release_vmalloc we only free the region covered by the shadow of orig_start to orig_end, and possibly 1 page to either side. So it will never attempt to free an enormous area. And it will also do nothing if called for a region where there is no shadow backin installed. Having said that, there should be a test on orig_start, and I've added that in v11 - good catch. > 2) Have you run any bencmarking just to see how much overhead it adds? > I am asking, because probably it make sense to add those figures to the > backlog(commit message). For example you can run: > > <snip> > sudo ./test_vmalloc.sh performance > and > sudo ./test_vmalloc.sh sequential_test_order=1 > <snip> I have now done that: Testing with test_vmalloc.sh on an x86 VM with 2 vCPUs shows that: - Turning on KASAN, inline instrumentation, without this feature, introuduces a 4.1x-4.2x slowdown in vmalloc operations. - Turning this on introduces the following slowdowns over KASAN: * ~1.76x slower single-threaded (test_vmalloc.sh performance) * ~2.18x slower when both cpus are performing operations simultaneously (test_vmalloc.sh sequential_test_order=1) This is unfortunate but given that this is a debug feature only, not the end of the world. The full figures are: Performance No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN fix_size_alloc_test 1697913 14229459 8.38 22981983 13.54 1.62 full_fit_alloc_test 1841601 15152633 8.23 17902922 9.72 1.18 long_busy_list_alloc_test 17874082 58856758 3.29 103925371 5.81 1.77 random_size_alloc_test 9356047 29544085 3.16 57871338 6.19 1.96 fix_align_alloc_test 3188968 19821620 6.22 37979436 11.91 1.92 random_size_align_alloc_te 3033507 17584339 5.80 32588942 10.74 1.85 align_shift_alloc_test 325 1154 3.55 7263 22.35 6.29 pcpu_alloc_test 231952 278181 1.20 318977 1.38 1.15 Total Cycles 235852824254 985040965542 4.18 1733258779416 7.35 1.76 Sequential, 2 cpus No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN fix_size_alloc_test 2505806 17989253 7.18 39651038 15.82 2.20 full_fit_alloc_test 3579676 18829862 5.26 21142645 5.91 1.12 long_busy_list_alloc_test 21594983 74766736 3.46 140701363 6.52 1.88 random_size_alloc_test 10884695 34282077 3.15 91945108 8.45 2.68 fix_align_alloc_test 4133226 26304745 6.36 76163270 18.43 2.90 random_size_align_alloc_te 4261175 22927883 5.38 55236058 12.96 2.41 align_shift_alloc_test 948 4827 5.09 4144 4.37 0.86 pcpu_alloc_test 371789 307654 0.83 374412 1.01 1.22 Total Cycles 99965417402 412710461642 4.13 897968646378 8.98 2.18 fix_size_alloc_test 2502718 17921542 7.16 39893515 15.94 2.23 full_fit_alloc_test 3547996 18675007 5.26 21330495 6.01 1.14 long_busy_list_alloc_test 21522579 74610739 3.47 139822907 6.50 1.87 random_size_alloc_test 10881507 34317349 3.15 91110531 8.37 2.65 fix_align_alloc_test 4119755 26180887 6.35 75818927 18.40 2.90 random_size_align_alloc_te 4297708 23058344 5.37 55969004 13.02 2.43 align_shift_alloc_test 956 5574 5.83 4591 4.80 0.82 pcpu_alloc_test 306340 347014 1.13 571289 1.86 1.65 Total Cycles 99642832084 412084074628 4.14 896497227762 9.00 2.18 Regards, Daniel > Thanks! > > -- > Vlad Rezki