On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote: > -int kernel_read_file(struct file *file, void **buf, loff_t *size, > - loff_t max_size, enum kernel_read_file_id id) > -{ > - loff_t i_size, pos; > +int kernel_pread_file(struct file *file, void **buf, loff_t *size, > + loff_t pos, loff_t max_size, > + enum kernel_pread_opt opt, > + enum kernel_read_file_id id) > +{ > + loff_t alloc_size; > + loff_t buf_pos; > + loff_t read_end; > + loff_t i_size; > ssize_t bytes = 0; > int ret; > Look, it's not your fault, but this is a great example of how we end up with atrocious interfaces. Someone comes along and implements a simple DWIM interface that solves their problem. Then somebody else adds a slight variant that solves their problem, and so on and so on, and we end up with this bonkers API where the arguments literally change meaning depending on other arguments. > @@ -950,21 +955,31 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size, > ret = -EINVAL; > goto out; > } > - if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) { > + > + /* Default read to end of file */ > + read_end = i_size; > + > + /* Allow reading partial portion of file */ > + if ((opt == KERNEL_PREAD_PART) && > + (i_size > (pos + max_size))) > + read_end = pos + max_size; > + > + alloc_size = read_end - pos; > + if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) { > ret = -EFBIG; > goto out; ... like that. I think what we actually want is: ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp); void vunmap_file_range(struct file *, void *buf); If end > i_size, limit the allocation to i_size. Returns the number of bytes allocated, or a negative errno. Writes the pointer allocated to *bufp. Internally, it should use the page cache to read in the pages (taking appropriate reference counts). Then it maps them using vmap() instead of copying them to a private vmalloc() array. kernel_read_file() can be converted to use this API. The users will need to be changed to call kernel_read_end(struct file *file, void *buf) instead of vfree() so it can call allow_write_access() for them. vmap_file_range() has a lot of potential uses. I'm surprised we don't have it already, to be honest.