On 07/08/2014 12:03 PM, Naoya Horiguchi wrote: >> > The biggest question for me, though, is whether we want to start >> > designing these per-page interfaces to consider different page sizes, or >> > whether we're going to just continue to pretend that the entire world is >> > 4k pages. Using FINCORE_BMAP on 1GB hugetlbfs files would be a bit >> > silly, for instance. > I didn't answer this question, sorry. > > In my option, hugetlbfs pages should be handled as one hugepage (not as > many 4kB pages) to avoid lots of meaningless data transfer, as you pointed > out. And the current patch already works like that. Just reading the code, I don't see any way that pc_shift gets passed down in to the do_fincore() loop. I don't see it getting reflected in to 'nr' or 'nr_pages' in there, and I can't see how: jump = iter.index - fc->pgstart - nr; can possibly be right since iter.index is being kept against the offset in the userspace buffer (4k pages) and 'nr' and fc->pgstart are essentially done in the huge page size. If you had a 2-page 1GB-hpage_size() hugetlbfs file, you would only have two pages in the radix tree, and only two iterations of radix_tree_for_each_slot(). It would only set the first two bytes of a 256k BMAP buffer since only two pages were encountered in the radix tree. Or am I reading your code wrong again? -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html