On Wed, Oct 24, 2018 at 12:41 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Tue, Oct 23, 2018 at 05:13:40PM -0400, Marc Orr wrote: > > > + struct vcpu_vmx *vmx = __vmalloc_node_range( > > > + sizeof(struct vcpu_vmx), > > > + __alignof__(struct vcpu_vmx), > > > + VMALLOC_START, > > > + VMALLOC_END, > > > + GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO | __GFP_ACCOUNT, > > > + PAGE_KERNEL, > > > + 0, > > > + NUMA_NO_NODE, > > > + __builtin_return_address(0)); > > I don't understand why you need to expose the lowest-level > __vmalloc_node_range to do what you need to do. > > For example, __vmalloc_node would be easier for you to use while giving you > all the flexibility you think you want. > > In fact, I don't think you need all the flexibility you're using. > vmalloc is always going to give you a page-aligned address, so > __alignof__(struct foo) isn't going to do anything worthwhile. > > VMALLOC_START, VMALLOC_END, PAGE_KERNEL, 0, NUMA_NO_NODE, and > __builtin_return_address(0) are all the defaults. So all you actually > need are these GFP flags. __GFP_HIGHMEM isn't needed; vmalloc can > always allocate from highmem unless you're doing a DMA alloc. So > it's just __GFP_ACCOUNT that's not supported by regular vzalloc(). > > I see __vmalloc_node_flags_caller is already non-static, so that would > be where I went and your call becomes simply: > > vmx = __vmalloc_node_flags_caller(sizeof(struct vcpu_vmx), > NUMA_NO_NODE, > GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT, > __builtin_return_address(0)); > > I suspect a better option might be to add a vzalloc_account() call > and then your code becomes: > > vmx = vzalloc_account(sizeof(struct vcpu_vmx)); > > while vmalloc gains: > > void *vmalloc_account(unsigned long size) > { > return __vmalloc_node_flags(size, NUMA_NO_NODE, > GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT); > } > EXPORT_SYMBOL(vmalloc_account); I 100% agree with this review! I only need the __GFP_ACCOUNT flag. Actually, the first version of this patch that I developed downstream, resembled what you're suggesting here. But I've never touched the mm subsystem before, and we were not confident on how to shape the patch for upstreaming, so that's how we ended up with this version, which makes minimal changes to the mm subsystem. Anyway, let me refactor the patch exactly as you've suggested in your review, and send out a new version.