On Fri, 7 Apr 2023 at 13:11, Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx> wrote: > On 4/7/23 3:14 PM, Michał Mirosław wrote: > > On Fri, 7 Apr 2023 at 12:04, Muhammad Usama Anjum > > <usama.anjum@xxxxxxxxxxxxx> wrote: > >> On 4/7/23 12:34 PM, Michał Mirosław wrote: > >>> On Thu, 6 Apr 2023 at 23:04, Muhammad Usama Anjum > >>> <usama.anjum@xxxxxxxxxxxxx> wrote: > >>>> On 4/7/23 1:00 AM, Michał Mirosław wrote: > >>>>> On Thu, 6 Apr 2023 at 19:58, Muhammad Usama Anjum > >>>>> <usama.anjum@xxxxxxxxxxxxx> wrote: > > [...] > >>>>>>>> + /* > >>>>>>>> + * Allocate smaller buffer to get output from inside the page walk > >>>>>>>> + * functions and walk page range in PAGEMAP_WALK_SIZE size chunks. As > >>>>>>>> + * we want to return output to user in compact form where no two > >>>>>>>> + * consecutive regions should be continuous and have the same flags. > >>>>>>>> + * So store the latest element in p.cur between different walks and > >>>>>>>> + * store the p.cur at the end of the walk to the user buffer. > >>>>>>>> + */ > >>>>>>>> + p.vec = kmalloc_array(p.vec_len, sizeof(struct page_region), > >>>>>>>> + GFP_KERNEL); > >>>>>>>> + if (!p.vec) > >>>>>>>> + return -ENOMEM; > >>>>>>>> + > >>>>>>>> + walk_start = walk_end = start; > >>>>>>>> + while (walk_end < end && !ret) { > >>>>>>> > >>>>>>> The loop will stop if a previous iteration returned ENOSPC (and the > >>>>>>> error will be lost) - is it intended? > >>>>>> It is intentional. -ENOSPC means that the user buffer is full even though > >>>>>> there was more memory to walk over. We don't treat this error. So when > >>>>>> buffer gets full, we stop walking over further as user buffer has gotten > >>>>>> full and return as success. > >>>>> > >>>>> Thanks. What's the difference between -ENOSPC and > >>>>> PM_SCAN_FOUND_MAX_PAGES? They seem to result in the same effect (code > >>>>> flow). > >>>> -ENOSPC --> user buffer has been filled completely > >>>> PM_SCAN_FOUND_MAX_PAGES --> max_pages have been found, user buffer may > >>>> still have more space > >>> > >>> What is the difference in code behaviour when those two cases are > >>> compared? (I'd expect none.) > >> There is difference: > >> We add data to user buffer. If it succeeds with return code 0, we engage > >> the WP. If it succeeds with PM_SCAN_FOUND_MAX_PAGES, we still engage the > >> WP. But if we get -ENOSPC, we don't perform engage as the data wasn't added > >> to the user buffer. > > > > Thanks! I see it now. I see a few more corner cases here: > > 1. If we did engage WP but fail to copy the vector we return -EFAULT > > but the WP is already engaged. I'm not sure this is something worth > > guarding against, but documenting that would be helpful I think. > Sure. > > > 2. If uffd_wp_range() fails, but we have already processed pages > > earlier, we should treat the error like ENOSPC and back out the failed > > range (the earier changes would be lost otherwise). > Backing out is easier to do for hugepages. But for normal pages, we'll have > to write some code to find where the current data was added (in cur or in > vec) and back out from that. I'll have to write some more code to avoid the > side-effects as well. If I read the code correctly, the last page should always be in `cur` and on failure only a single page is needed to be backed out. Did I miss something? > But aren't we going over-engineering here? Error occurred and we are trying > to keep the previously generated correct data and returning successfully > still to the user? I don't think we should do this. An error is error. We > should return the error simply even if the memory flags would get lost. We > don't know what caused the error in uffd_wp_range(). Under normal > situation, we there shouldn't have had error. In this case it means that on (intermittent) allocation error we get inconsistent or non-deterministic results. I wouldn't want to be the one debugging this later - I'd prefer either the syscall be "exception-safe" (give consistent and predictable output) or kill the process. Best Regards Michał Mirosław