On Thu, 21 Jul 2011 15:09:08 -0700 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, 19 Jul 2011 00:35:37 +0930 > Christopher Yeoh <cyeoh@xxxxxxxxxxx> wrote: > > > +/* > > + * process_vm_rw_pages - read/write pages from task specified > > + * @task: task to read/write from > > + * @mm: mm for task > > + * @process_pages: struct pages area that can store at least > > + * nr_pages_to_copy struct page pointers > > + * @pa: address of page in task to start copying from/to > > + * @start_offset: offset in page to start copying from/to > > + * @len: number of bytes to copy > > + * @lvec: iovec array specifying where to copy to/from > > + * @lvec_cnt: number of elements in iovec array > > + * @lvec_current: index in iovec array we are up to > > + * @lvec_offset: offset in bytes from current iovec iov_base we > > are up to > > + * @vm_write: 0 means copy from, 1 means copy to > > + * @nr_pages_to_copy: number of pages to copy > > + */ > > +static ssize_t process_vm_rw_pages(struct task_struct *task, > > + struct mm_struct *mm, > > + - *lvec_offset); > > ..... > > > > + > > + target_kaddr = kmap(process_pages[pgs_copied]) + > > start_offset; + > > + if (vm_write) > > + ret = copy_from_user(target_kaddr, > > + > > lvec[*lvec_current].iov_base > > + + *lvec_offset, > > + bytes_to_copy); > > + else > > + ret = > > copy_to_user(lvec[*lvec_current].iov_base > > + + *lvec_offset, > > + target_kaddr, > > bytes_to_copy); > > + kunmap(process_pages[pgs_copied]); > > + if (ret) { > > + pgs_copied++; > > + goto end; > > afacit this will always return -EFAULT, even after copying some > memory. > > Is this a misdesign? Would it not be better to return the number of > bytes actually copied, or -EFAULT is we weren't able to copy any? > Like read(). > > That way, userspace can at least process the data which _was_ > transferred, before retrying and then handling the fault. With the > proposed interface this is not possible, and the data is lost. Perhaps because of how I'm using the interface for MPI I was thinking that if it fails at this point due to EFAULT then its an application error and that the partial read/write wouldn't be used. But I see your point could be true for other uses of the interface and the next version will return partial read/write information all the way back to userspace. > And note that the function's kerneldoc doesn't document the return > value at all! kerneldoc sucks that way - there's no formal place in > the template, so people often ignore this important part of the > function interface. Similarly, there's no kerneldoc template for > preconditions such as irq on/off, locks which must be held, etc. > So they don't get documented. But they're part of the interface. > Ok. Will be fixed in next version > > nr_pages_to_copy); > > + start_offset = 0; > > + > > + if (rc < 0) > > + return rc; > > It's propagated here. > > (CodingStyleNit: it's conventional to put {} around the single-line > block in this case, btw). > Fixed now. > > + > > + if (nr_pages == 0) > > + return 0; > > + > > + /* For reliability don't try to kmalloc more than 2 pages > > worth */ > > + process_pages = kmalloc(min_t(size_t, > > PVM_MAX_KMALLOC_PAGES, > > + sizeof(struct pages > > *)*nr_pages), > > + GFP_KERNEL); > > You might get some speed benefit by optimising for the small copies > here. Define a local on-stack array of N page*'s and point > process_pages at that if the number of pages is <= N. Saves a > malloc/free and is more cache-friendly. But only if the result is > measurable! ok. will do some benchmarking to see if its worth it. > > + > > +static ssize_t process_vm_rw_check_iovecs(pid_t pid, > > + const struct iovec > > __user *lvec, > > + unsigned long liovcnt, > > + const struct iovec > > __user *rvec, > > + unsigned long riovcnt, > > + unsigned long flags, int > > vm_write) > > I'm allergic to functions with "check" in their name. Check for what? > > And one would expect a check_foo() to return a bool or an errno. This > one returns a ssize_t! Weird! Interface documentation, please. > Including return value semantics ;) That part was split out from the main function because the iovec checks are different for 32 bit compatibility. Have renamed: process_vm_rw -> process_vm_rw_core process_vm_rw_check_iovecs -> process_vm_rw compat_process_vm_rw_check_iovecs -> compat_process_vm_rw ...and added more doco - with return value semantics :-) Chris -- cyeoh@xxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html