On Mon, 2020-06-08 at 06:16 -0700, Matthew Wilcox wrote: > On Mon, Jun 08, 2020 at 09:03:21AM -0400, Mimi Zohar wrote: > > On Sat, 2020-06-06 at 08:52 -0700, Matthew Wilcox wrote: > > > On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote: > > > > -int kernel_read_file(struct file *file, void **buf, loff_t *size, > > > > - loff_t max_size, enum kernel_read_file_id id) > > > > -{ > > > > - loff_t i_size, pos; > > > > +int kernel_pread_file(struct file *file, void **buf, loff_t *size, > > > > + loff_t pos, loff_t max_size, > > > > + enum kernel_pread_opt opt, > > > > + enum kernel_read_file_id id) > > > > +{ > > > > + loff_t alloc_size; > > > > + loff_t buf_pos; > > > > + loff_t read_end; > > > > + loff_t i_size; > > > > ssize_t bytes = 0; > > > > int ret; > > > > > > > > > > Look, it's not your fault, but this is a great example of how we end > > > up with atrocious interfaces. Someone comes along and implements a > > > simple DWIM interface that solves their problem. Then somebody else > > > adds a slight variant that solves their problem, and so on and so on, > > > and we end up with this bonkers API where the arguments literally change > > > meaning depending on other arguments. > > > > > > > @@ -950,21 +955,31 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size, > > > > ret = -EINVAL; > > > > goto out; > > > > } > > > > - if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) { > > > > + > > > > + /* Default read to end of file */ > > > > + read_end = i_size; > > > > + > > > > + /* Allow reading partial portion of file */ > > > > + if ((opt == KERNEL_PREAD_PART) && > > > > + (i_size > (pos + max_size))) > > > > + read_end = pos + max_size; > > > > + > > > > + alloc_size = read_end - pos; > > > > + if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) { > > > > ret = -EFBIG; > > > > goto out; > > > > > > ... like that. > > > > > > I think what we actually want is: > > > > > > ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp); > > > void vunmap_file_range(struct file *, void *buf); > > > > > > If end > i_size, limit the allocation to i_size. Returns the number > > > of bytes allocated, or a negative errno. Writes the pointer allocated > > > to *bufp. Internally, it should use the page cache to read in the pages > > > (taking appropriate reference counts). Then it maps them using vmap() > > > instead of copying them to a private vmalloc() array. > > > > > > kernel_read_file() can be converted to use this API. The users will > > > need to be changed to call kernel_read_end(struct file *file, void *buf) > > > instead of vfree() so it can call allow_write_access() for them. > > > > > > vmap_file_range() has a lot of potential uses. I'm surprised we don't > > > have it already, to be honest. > > > > Prior to kernel_read_file() the same or verify similar code existed in > > multiple places in the kernel. The kernel_read_file() API > > consolidated the existing code adding the pre and post security hooks. > > > > With this new design of not using a private vmalloc, will the file > > data be accessible prior to the post security hooks? From an IMA > > perspective, the hooks are used for measuring and/or verifying the > > integrity of the file. > > File data is already accessible prior to the post security hooks. > Look how kernel_read_file works: > > ret = deny_write_access(file); > ret = security_kernel_read_file(file, id); > *buf = vmalloc(i_size); > bytes = kernel_read(file, *buf + pos, i_size - pos, &pos); > ret = security_kernel_post_read_file(file, *buf, i_size, id); > > kernel_read() will read the data into the page cache and then copy it > into the vmalloc'd buffer. There's nothing here to prevent read accesses > to the file. The post security hook needs to access to the file data in order to calculate the file hash. The question is whether prior to returning from kernel_read_file() the caller can access the file data. Mimi