On 4/11/23 2:29 PM, Michał Mirosław wrote: > On Fri, 7 Apr 2023 at 13:11, Muhammad Usama Anjum > <usama.anjum@xxxxxxxxxxxxx> wrote: >> On 4/7/23 3:14 PM, Michał Mirosław wrote: >>> On Fri, 7 Apr 2023 at 12:04, Muhammad Usama Anjum >>> <usama.anjum@xxxxxxxxxxxxx> wrote: >>>> On 4/7/23 12:34 PM, Michał Mirosław wrote: >>>>> On Thu, 6 Apr 2023 at 23:04, Muhammad Usama Anjum >>>>> <usama.anjum@xxxxxxxxxxxxx> wrote: >>>>>> On 4/7/23 1:00 AM, Michał Mirosław wrote: >>>>>>> On Thu, 6 Apr 2023 at 19:58, Muhammad Usama Anjum >>>>>>> <usama.anjum@xxxxxxxxxxxxx> wrote: >>> [...] >>>>>>>>>> + /* >>>>>>>>>> + * Allocate smaller buffer to get output from inside the page walk >>>>>>>>>> + * functions and walk page range in PAGEMAP_WALK_SIZE size chunks. As >>>>>>>>>> + * we want to return output to user in compact form where no two >>>>>>>>>> + * consecutive regions should be continuous and have the same flags. >>>>>>>>>> + * So store the latest element in p.cur between different walks and >>>>>>>>>> + * store the p.cur at the end of the walk to the user buffer. >>>>>>>>>> + */ >>>>>>>>>> + p.vec = kmalloc_array(p.vec_len, sizeof(struct page_region), >>>>>>>>>> + GFP_KERNEL); >>>>>>>>>> + if (!p.vec) >>>>>>>>>> + return -ENOMEM; >>>>>>>>>> + >>>>>>>>>> + walk_start = walk_end = start; >>>>>>>>>> + while (walk_end < end && !ret) { >>>>>>>>> >>>>>>>>> The loop will stop if a previous iteration returned ENOSPC (and the >>>>>>>>> error will be lost) - is it intended? >>>>>>>> It is intentional. -ENOSPC means that the user buffer is full even though >>>>>>>> there was more memory to walk over. We don't treat this error. So when >>>>>>>> buffer gets full, we stop walking over further as user buffer has gotten >>>>>>>> full and return as success. >>>>>>> >>>>>>> Thanks. What's the difference between -ENOSPC and >>>>>>> PM_SCAN_FOUND_MAX_PAGES? They seem to result in the same effect (code >>>>>>> flow). >>>>>> -ENOSPC --> user buffer has been filled completely >>>>>> PM_SCAN_FOUND_MAX_PAGES --> max_pages have been found, user buffer may >>>>>> still have more space >>>>> >>>>> What is the difference in code behaviour when those two cases are >>>>> compared? (I'd expect none.) >>>> There is difference: >>>> We add data to user buffer. If it succeeds with return code 0, we engage >>>> the WP. If it succeeds with PM_SCAN_FOUND_MAX_PAGES, we still engage the >>>> WP. But if we get -ENOSPC, we don't perform engage as the data wasn't added >>>> to the user buffer. >>> >>> Thanks! I see it now. I see a few more corner cases here: >>> 1. If we did engage WP but fail to copy the vector we return -EFAULT >>> but the WP is already engaged. I'm not sure this is something worth >>> guarding against, but documenting that would be helpful I think. >> Sure. >> >>> 2. If uffd_wp_range() fails, but we have already processed pages >>> earlier, we should treat the error like ENOSPC and back out the failed >>> range (the earier changes would be lost otherwise). >> Backing out is easier to do for hugepages. But for normal pages, we'll have >> to write some code to find where the current data was added (in cur or in >> vec) and back out from that. I'll have to write some more code to avoid the >> side-effects as well. > > If I read the code correctly, the last page should always be in `cur` > and on failure only a single page is needed to be backed out. Did I > miss something? I'm leaving using uffd_wp_range() in next revision as it is costing performance. This will not be needed in next revision. > >> But aren't we going over-engineering here? Error occurred and we are trying >> to keep the previously generated correct data and returning successfully >> still to the user? I don't think we should do this. An error is error. We >> should return the error simply even if the memory flags would get lost. We >> don't know what caused the error in uffd_wp_range(). Under normal >> situation, we there shouldn't have had error. > > In this case it means that on (intermittent) allocation error we get > inconsistent or non-deterministic results. I wouldn't want to be the > one debugging this later - I'd prefer either the syscall be > "exception-safe" (give consistent and predictable output) or kill the > process. > > Best Regards > Michał Mirosław -- BR, Muhammad Usama Anjum