Re: [PATCH v2 1/3] kasan: support backing vmalloc space with real shadow memory

Daniel Axtens <dja@xxxxxxxxxx> · Wed, 31 Jul 2019 16:34:42 +1000

Daniel Axtens <dja@xxxxxxxxxx> writes:

> Hi Mark,
>
> Thanks for your email - I'm very new to mm stuff and the feedback is
> very helpful.
>
>>> +#ifndef CONFIG_KASAN_VMALLOC
>>>  int kasan_module_alloc(void *addr, size_t size)
>>>  {
>>>  	void *ret;
>>> @@ -603,6 +604,7 @@ void kasan_free_shadow(const struct vm_struct *vm)
>>>  	if (vm->flags & VM_KASAN)
>>>  		vfree(kasan_mem_to_shadow(vm->addr));
>>>  }
>>> +#endif
>>
>> IIUC we can drop MODULE_ALIGN back to PAGE_SIZE in this case, too.
>
> Yes, done.
>
>>>  core_initcall(kasan_memhotplug_init);
>>>  #endif
>>> +
>>> +#ifdef CONFIG_KASAN_VMALLOC
>>> +void kasan_cover_vmalloc(unsigned long requested_size, struct vm_struct *area)
>>
>> Nit: I think it would be more consistent to call this
>> kasan_populate_vmalloc().
>>
>
> Absolutely. I didn't love the name but just didn't 'click' that populate
> would be a better verb.
>
>>> +{
>>> +	unsigned long shadow_alloc_start, shadow_alloc_end;
>>> +	unsigned long addr;
>>> +	unsigned long backing;
>>> +	pgd_t *pgdp;
>>> +	p4d_t *p4dp;
>>> +	pud_t *pudp;
>>> +	pmd_t *pmdp;
>>> +	pte_t *ptep;
>>> +	pte_t backing_pte;
>>
>> Nit: I think it would be preferable to use 'page' rather than 'backing',
>> and 'pte' rather than 'backing_pte', since there's no otehr namespace to
>> collide with here. Otherwise, using 'shadow' rather than 'backing' would
>> be consistent with the existing kasan code.
>
> Not a problem, done.
>
>>> +	addr = shadow_alloc_start;
>>> +	do {
>>> +		pgdp = pgd_offset_k(addr);
>>> +		p4dp = p4d_alloc(&init_mm, pgdp, addr);
>>> +		pudp = pud_alloc(&init_mm, p4dp, addr);
>>> +		pmdp = pmd_alloc(&init_mm, pudp, addr);
>>> +		ptep = pte_alloc_kernel(pmdp, addr);
>>> +
>>> +		/*
>>> +		 * we can validly get here if pte is not none: it means we
>>> +		 * allocated this page earlier to use part of it for another
>>> +		 * allocation
>>> +		 */
>>> +		if (pte_none(*ptep)) {
>>> +			backing = __get_free_page(GFP_KERNEL);
>>> +			backing_pte = pfn_pte(PFN_DOWN(__pa(backing)),
>>> +					      PAGE_KERNEL);
>>> +			set_pte_at(&init_mm, addr, ptep, backing_pte);
>>> +		}
>>
>> Does anything prevent two threads from racing to allocate the same
>> shadow page?
>>
>> AFAICT it's possible for two threads to get down to the ptep, then both
>> see pte_none(*ptep)), then both try to allocate the same page.
>>
>> I suspect we have to take init_mm::page_table_lock when plumbing this
>> in, similarly to __pte_alloc().
>
> Good catch. I think you're right, I'll add the lock.
>
>>> +	} while (addr += PAGE_SIZE, addr != shadow_alloc_end);
>>> +
>>> +	kasan_unpoison_shadow(area->addr, requested_size);
>>> +	requested_size = round_up(requested_size, KASAN_SHADOW_SCALE_SIZE);
>>> +	kasan_poison_shadow(area->addr + requested_size,
>>> +			    area->size - requested_size,
>>> +			    KASAN_VMALLOC_INVALID);
>>
>> IIUC, this could leave the final portion of an allocated page
>> unpoisoned.
>>
>> I think it might make more sense to poison each page when it's
>> allocated, then plumb it into the page tables, then unpoison the object.
>>
>> That way, we can rely on any shadow allocated by another thread having
>> been initialized to KASAN_VMALLOC_INVALID, and only need mutual
>> exclusion when allocating the shadow, rather than when poisoning
>> objects.

I've come a bit unstuck on this one. If a vmalloc address range is
reused, we can end up with the following sequence:

 - p = vmalloc(PAGE_SIZE) allocates ffffc90000000000

 - kasan_populate_shadow allocates a shadow page, fills it with
   KASAN_VMALLOC_INVALID, and then unpoisions
   PAGE_SIZE >> KASAN_SHADOW_SHIFT_SIZE bytes

 - vfree(p)

 - p = vmalloc(3000) also allocates ffffc90000000000 because of address
   reuse in vmalloc.

 - Now kasan_populate_shadow doesn't allocate a page, so does no
   poisioning.

 - kasan_populate_shadow unpoisions 3000 >> KASAN_SHADOW_SHIFT_SIZE
   bytes, but the PAGE_SIZE-3000 extra bytes are still unpoisioned, so
   accesses that are out-of-bounds for the 3000 byte allocation are
   missed.

So I think we do need to poision the shadow of the [requested_size,
area->size) region each time. However, I don't think we need mutual
exclusion to be able to do this safely. I think the safety is guaranteed
by vmalloc not giving the same page to multiple allocations. Because no
two threads are going to get overlapping vmalloc/vmap allocations, their
shadow ranges are not going to overlap, and so they're not going to
trample over each other.

I think it's probably still worth poisioning the pages on allocation:
for one thing, you are right that part of the shadow page will not be
poisioned otherwise, and secondly it means you migh get a kasan splat
before you get a page-not-present fault if you access beyond an
allocation, at least if the shadow happens to fall helpfully within an
already-allocated page.

v3 to come soon.

Regards,
Daniel

>
> Yes, that makes sense, will do.
>
> Thanks again,
> Daniel