On 22/09/2020 17:33, Christoph Hellwig wrote:
On Tue, Sep 22, 2020 at 05:13:45PM +0100, Tvrtko Ursulin wrote:
void *shmem_pin_map(struct file *file)
{
- const size_t n_pte = shmem_npte(file);
- pte_t *stack[32], **ptes, **mem;
Chris can comment how much he'd miss the 32 page stack shortcut.
I'd like to see a profile that claim that kmalloc matters in a
path that does a vmap and reads pages through the page cache.
Especially when the kmalloc saves doing another page cache lookup
on the free side.
Only reason I can come up with now is if mapping side is on a latency
sensitive path, while un-mapping is lazy/delayed so can be more costly.
Then fast map and extra cost on unmap may make sense.
It more applies to the other i915 patch, which implements a much more
used API, but whether or not we can demonstrate any difference in the
perf profiles I couldn't tell you without trying to collect some.
Is there something in vmap() preventing us from freeing the pages array
here? I can't spot anything that is holding on to the pointer. Or it was
just a sketch before you realized we could walk the vm_area?
Also, I may be totally misunderstanding something, but I think you need to
assign area->pages manually so shmem_unpin_map can access it below.
We need area->pages to hold the pages for the free side. That being
said the patch I posted is broken because it never assigned to that.
As said it was a sketch. This is the patch I just rebooted into on
my Laptop:
http://git.infradead.org/users/hch/misc.git/commitdiff/048522dfa26b6667adfb0371ff530dc263abe829
it needs extra prep patches from the series:
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/alloc_vm_area
mapping_clear_unevictable(file->f_mapping);
- __shmem_unpin_map(file, ptr, shmem_npte(file));
+ for (i = 0; i < shmem_npages(file); i++)
+ put_page(area->pages[i]);
+ kvfree(area->pages);
+ vunmap(ptr);
Is the verdict from mm experts that we can't use vfree due __free_pages vs
put_page differences?
Switched to vfree now.
Could we get from ptes to pages, so that we don't have to keep the
area->pages array allocated for the duration of the pin?
We could do vmalloc_to_page, but that is fairly expensive (not as bad
as reading from the page cache..). Are you really worried about the
allocation?
Not so much given how we don't even use shmem_pin_map outside selftests.
If we start using it I expect it will be for tiny objects anyway. Only
if they end up being pinned for the lifetime of the driver, it may be a
pointless waste of memory compared to the downsides of vmalloc_to_page.
But we can revisit this particular edge case optimization if the need
arises.
I'll look at your other i915 patch tomorrow.
Regards,
Tvrtko
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel