On 8/11/23 6:32 PM, Michał Mirosław wrote: > On Fri, Aug 11, 2023 at 05:02:44PM +0500, Muhammad Usama Anjum wrote: >> On 8/10/23 10:32 PM, Michał Mirosław wrote: >>> On Wed, 9 Aug 2023 at 08:16, Muhammad Usama Anjum >>> <usama.anjum@xxxxxxxxxxxxx> wrote: > [...] >>>> --- a/fs/proc/task_mmu.c >>>> +++ b/fs/proc/task_mmu.c >>> [...] >>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>> +static unsigned long pagemap_thp_category(pmd_t pmd) >>>> +{ >>>> + unsigned long categories = PAGE_IS_HUGE; >>>> + >>>> + if (pmd_present(pmd)) { >>>> + categories |= PAGE_IS_PRESENT; >>>> + if (!pmd_uffd_wp(pmd)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + if (is_zero_pfn(pmd_pfn(pmd))) >>>> + categories |= PAGE_IS_PFNZERO; >>>> + } else if (is_swap_pmd(pmd)) { >>>> + categories |= PAGE_IS_SWAPPED; >>>> + if (!pmd_swp_uffd_wp(pmd)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + } >>>> + >>>> + return categories; >>>> +} >>> I guess THPs can't be file-backed currently, but can we somehow mark >>> this assumption so it can be easily found if the capability arrives? >> Yeah, THPs cannot be file backed. Lets not care for some feature which may >> not arrive in several years or eternity. > > Yes, it might not arrive. But please add at least a comment, so that it > is clearly visible that lack if PAGE_IS_FILE here is intentional. I'll add a comment. > >>>> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ >>>> + >>>> +#ifdef CONFIG_HUGETLB_PAGE >>>> +static unsigned long pagemap_hugetlb_category(pte_t pte) >>>> +{ >>>> + unsigned long categories = PAGE_IS_HUGE; >>>> + >>>> + if (pte_present(pte)) { >>>> + categories |= PAGE_IS_PRESENT; >>>> + if (!huge_pte_uffd_wp(pte)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + if (!PageAnon(pte_page(pte))) >>>> + categories |= PAGE_IS_FILE; >>>> + if (is_zero_pfn(pte_pfn(pte))) >>>> + categories |= PAGE_IS_PFNZERO; >>>> + } else if (is_swap_pte(pte)) { >>>> + categories |= PAGE_IS_SWAPPED; >>>> + if (!pte_swp_uffd_wp_any(pte)) >>>> + categories |= PAGE_IS_WRITTEN; >>>> + } >>> >>> BTW, can a HugeTLB page be file-backed and swapped out? >> Accourding to pagemap_hugetlb_range(), file-backed HugeTLB page cannot be >> swapped. > > Here too a comment that leaving out this case is intentional would be useful. Sure. > >>> [...] >>>> + walk_start = p.arg.start; >>>> + for (; walk_start < p.arg.end; walk_start = p.arg.walk_end) { > [...[ >>>> + ret = mmap_read_lock_killable(mm); >>>> + if (ret) >>>> + break; >>>> + ret = walk_page_range(mm, walk_start, p.arg.end, >>>> + &pagemap_scan_ops, &p); >>>> + mmap_read_unlock(mm); > [...] >>>> + if (ret != -ENOSPC || p.arg.vec_len - 1 == 0 || >>>> + p.found_pages == p.arg.max_pages) >>>> + break; >>> >>> The second condition is equivalent to `p.arg.vec_len == 1`, but why is >>> that an ending condition? Isn't the last entry enough to gather one >>> more range? (The walk could have returned -ENOSPC due to buffer full >>> and after flushing it could continue with the last free entry.) >> Now we are walking the entire range walk_page_range(). We don't break loop >> when we get -ENOSPC as this error may only mean that the temporary buffer >> is full. So we need check if max pages have been found or output buffer is >> full or ret is 0 or any other error. When p.arg.vec_len = 1 is end >> condition as the last entry is in cur. As we have walked over the entire >> range, cur must be full after which the walk returned. >> >> So current condition is necessary. I've double checked it. I'll change it >> to `p.arg.vec_len == 1`. > > If we have walked the whole range, then the loop will end anyway due to > `walk_start < walk_end` not held in the `for()`'s condition. Sorry, for not explaining to-the-point. Why would we walk the entire range when we should recognize that the output buffer is full and break the loop? I've test cases written for this case. If I remove `p.arg.vec_len == 1` check, there is infinite loop for walking. So we are doing correct thing here. > > [...] >>>> +/* >>>> + * struct pm_scan_arg - Pagemap ioctl argument >>>> + * @size: Size of the structure >>>> + * @flags: Flags for the IOCTL >>>> + * @start: Starting address of the region >>>> + * @end: Ending address of the region >>>> + * @walk_end Address where the scan stopped (written by kernel). >>>> + * walk_end == end informs that the scan completed on entire range. >>> >>> Can we ensure this holds also for the tagged pointers? >> No, we cannot. > > So this need explanation in the comment here. (Though I'd still like to > know how the address tags are supposed to be used from someone that > knows them.) I've looked at some documentations (presentations/talks) about tags. Tags is more like userspace feature. Kernel should just ignore them for our use case. I'll add comment. > > Best Regards > Michał Mirosław -- BR, Muhammad Usama Anjum