I figured it out, the bug is still in the use of page_cache_next_miss(),
but my earlier suggestion that xas_next always advances the pointer is
wrong.
Mike is right in that xas_next() is effectively xas_load() for the first
call after an XA_STATE() initialiation.
However, when max_scan is set to 1, xas.xa_index has no chance to be
advanced since the loop condition while(max_scan--) terminates the loop
after 1 iteration.
Hence, after loading happens with xas_next() regardless of the checks
within the loop (!entry, or xa_is_value(entry), etc), xa.xas_index is
not advanced, and the index returned always == the index passed in to
page_cache_next_miss().
Hence, in hugetlb_fallocate(), it always appears to be a page cache
miss.
Here's code from a selftest that can be added to lib/test_xarray.c:
/* Modified from page_cache_next_miss() */
static unsigned long xa_next_empty(struct xarray *xa, unsigned long start,
unsigned long max_scan)
{
XA_STATE(xas, xa, start);
while (max_scan--) {
void *entry = xas_next(&xas);
if (!entry) {
printk("entry not present");
break;
}
if (xa_is_value(entry)) {
printk("xa_is_value instead of pointer");
}
if (xas.xa_index == 0) {
printk("wraparound");
break;
}
}
if (max_scan == -1)
printk("exceeded max_scan");
return xas.xa_index;
}
/* Replace this function in lib/test_xarray.c to run */
static noinline void check_find(struct xarray *xa)
{
unsigned long max_scan;
xa_init(&xa);
xa_store_range(&xa, 3, 5, malloc(10), GFP_KERNEL);
max_scan = 1;
for (int i = 1; i < 8; i++)
printk(" => xa_next_empty(xa, %d, %ld): %ld\n", i, max_scan,
xa_next_empty(&xa, i, max_scan));
printk("\n");
max_scan = 2;
for (int i = 1; i < 8; i++)
printk(" => xa_next_empty(xa, %d, %ld): %ld\n", i, max_scan,
xa_next_empty(&xa, i, max_scan));
}
Result:
entry not present => xa_next_empty(xa, 1, 1): 1
entry not present => xa_next_empty(xa, 2, 1): 2
exceeded max_scan => xa_next_empty(xa, 3, 1): 3
exceeded max_scan => xa_next_empty(xa, 4, 1): 4
exceeded max_scan => xa_next_empty(xa, 5, 1): 5
entry not present => xa_next_empty(xa, 6, 1): 6
entry not present => xa_next_empty(xa, 7, 1): 7
entry not present => xa_next_empty(xa, 1, 2): 1
entry not present => xa_next_empty(xa, 2, 2): 2
exceeded max_scan => xa_next_empty(xa, 3, 2): 4
exceeded max_scan => xa_next_empty(xa, 4, 2): 5
exceeded max_scan => xa_next_empty(xa, 5, 2): 6
entry not present => xa_next_empty(xa, 6, 2): 6
entry not present => xa_next_empty(xa, 7, 2): 7
Since the xarray was set up with pointers in indices 3, 4 and 5, we
expect xa_next_empty() or page_cache_next_miss() to return the next
index (4, 5 and 6 respectively), but when used with a max_scan of 1, we
just get the index passed in.
While max_scan could be increased to fix this bug, I feel that having a
separate function like filemap_has_folio() makes the intent more
explicit and is less reliant on internal values of struct xa_state.
xas.xa_index could take other values to indicate wraparound or sibling
nodes and I think it is better to use a higher level abstraction like
xa_load() (used in filemap_has_folio()). In addition xa_load() also
takes the locks it needs, which helps :).
I could refactor the other usage of page_cache_next_miss() in
hugetlbfs_pagecache_present() if you'd like!
On this note, the docstring of page_cache_next_miss() is also
inaccurate. The return value is not always outside the range specified
if max_scan is too small.