On Thu, 6 Apr 2023 at 23:04, Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx> wrote: > On 4/7/23 1:00 AM, Michał Mirosław wrote: > > On Thu, 6 Apr 2023 at 19:58, Muhammad Usama Anjum > > <usama.anjum@xxxxxxxxxxxxx> wrote: [...] > >>>> + cur->len += n_pages; > >>>> + p->found_pages += n_pages; > >>>> + > >>>> + if (p->max_pages && (p->found_pages == p->max_pages)) > >>>> + return PM_SCAN_FOUND_MAX_PAGES; > >>>> + > >>>> + return 0; > >>>> + } > >>>> + > >>>> + if (!p->vec_index || ((p->vec_index + 1) < p->vec_len)) { > >>> > >>> It looks that `if (p->vec_index < p->vec_len)` is enough here - if we > >>> have vec_len == 0 here, then we'd not fit the entry in the userspace > >>> buffer anyway. Am I missing something? > >> No. I'd explained it with diagram last time: > >> https://lore.kernel.org/all/3c8d9ea0-1382-be0c-8dd2-d490eedd3b55@xxxxxxxxxxxxx > >> > >> I'll add a concise comment here. > > > > So it seems, but I think the code changed a bit and maybe could be > > simplified now? Since p->vec_len == 0 is currently not valid, the > > field could count only the entries available in p->vec[] -- IOW: not > > include p->cur in the count. > I see. But this'll not work as we need to count p->cur to don't go above > the maximum count, p->vec_size. You can subtract 1 from p->vec_size before the page walk to account for the buffer in `cur`. [...] > >>>> +static inline int pagemap_scan_deposit(struct pagemap_scan_private *p, > >>>> + struct page_region __user *vec, > >>>> + unsigned long *vec_index) > >>> > >>> ..._deposit() is used only in single place - please inline. > >> It is already inline. > > > > Sorry. I mean: please paste the code in place of the single call. > I've made it a separate function to make the code look better in the caller > function and logically easier to understand. This function is ugly. > do_pagemap_scan() is also already very long function with lots of things > happening. If you still insist, I'll remove this function. Please do remove - it will make the copy to userspace code all neatly together. [...] > >>>> + */ > >>>> + if (is_written && PM_SCAN_OP_IS_WP(p) && > >>>> + ((end - start < HPAGE_SIZE) || > >>>> + (p->max_pages && > >>>> + (p->max_pages - p->found_pages) < n_pages))) { > >>>> + > >>>> + split_huge_pmd(vma, pmd, start); > >>>> + goto process_smaller_pages; > >>>> + } > >>>> + > >>>> + if (p->max_pages && > >>>> + p->found_pages + n_pages > p->max_pages) > >>>> + n_pages = p->max_pages - p->found_pages; > >>>> + > >>>> + ret = pagemap_scan_output(is_written, is_file, is_present, > >>>> + is_swap, p, start, n_pages); > >>>> + if (ret < 0) > >>>> + return ret; > > > > So let's simplify this: > > > > if (p->max_pages && n_pages > max_pages - found_pages) > > n_pages = max_pages - found_pages; > > > > if (is_written && DO_WP && n_pages != HPAGE_SIZE / PAGE_SIZE) { > > split_thp(); > > goto process_smaller_pages; > > } > Clever!! This looks very sleek. > > > > > BTW, THP handling could be extracted to a function that would return > > -EAGAIN if it has split the page or it wasn't a THP -- and that would > > mean `goto process_smaller_pages`. > Other functions in this file handle the THP in this same way. So it feels > like more intuitive that we follow to same pattern in this file. I'll leave it to you. Extracting THP support would avoid a goto and #ifdef inside a function, though (and make the function smaller). > >>>> + /* > >>>> + * Allocate smaller buffer to get output from inside the page walk > >>>> + * functions and walk page range in PAGEMAP_WALK_SIZE size chunks. As > >>>> + * we want to return output to user in compact form where no two > >>>> + * consecutive regions should be continuous and have the same flags. > >>>> + * So store the latest element in p.cur between different walks and > >>>> + * store the p.cur at the end of the walk to the user buffer. > >>>> + */ > >>>> + p.vec = kmalloc_array(p.vec_len, sizeof(struct page_region), > >>>> + GFP_KERNEL); > >>>> + if (!p.vec) > >>>> + return -ENOMEM; > >>>> + > >>>> + walk_start = walk_end = start; > >>>> + while (walk_end < end && !ret) { > >>> > >>> The loop will stop if a previous iteration returned ENOSPC (and the > >>> error will be lost) - is it intended? > >> It is intentional. -ENOSPC means that the user buffer is full even though > >> there was more memory to walk over. We don't treat this error. So when > >> buffer gets full, we stop walking over further as user buffer has gotten > >> full and return as success. > > > > Thanks. What's the difference between -ENOSPC and > > PM_SCAN_FOUND_MAX_PAGES? They seem to result in the same effect (code > > flow). > -ENOSPC --> user buffer has been filled completely > PM_SCAN_FOUND_MAX_PAGES --> max_pages have been found, user buffer may > still have more space What is the difference in code behaviour when those two cases are compared? (I'd expect none.) Best Regards Michał Mirosław