On Fri, Nov 12, 2021 at 06:33:05PM +0100, Sebastian Andrzej Siewior wrote: > I've been looking at the xas_retry() based loops under PREEMPT_RT > constraints that is xarray::xa_lock owner got preempted by a task with > higher priority and this task is performing a xas_retry() based loop. > Since the xarray::xa_lock owner got preempted it can't make any progress > until the task the higher priority completes its task. > > Based on my understanding this the XA_RETRY_ENTRY state is transient > while a node is removed from the tree. That is, it is first removed from > the tree, then set to XA_RETRY_ENTRY and then kfree()ed. Any RCU reader > that retrieved this node before it was removed, will see this flag and > will iterate the array from beginning at which point it won't see this > node again. That is correct. If you have found a case where this isn't true, it is a bug and I would welcome the report. > The XA_ZERO_ENTRY flag is different as it is not transient. It should be > the responsibility of the reader not to start iterating the tree from > the beginning because this state won't change. > Most reader simply go to the next entry and I *assume* that for instance > mapping_get_entry() or find_get_entry() in mm/filemap.c won't see here > the XA_ZERO_ENTRY. The ZERO entry is somewhat special. Readers see it as a NULL, but it occupies space in the tree unlike an actual NULL entry, which is liable to having its space reclaimed. It really exists for the purposes of the IDR which distinguishes between a NULL entry and a deleted entry. In XArray terms, it's a reserved entry (ie we've reserved the memory so that a subsequent store doesn't need to allocate memory). Readers should all be skipping over ZERO entries (as they do NULL entries), not restarting the iteration if they see one. So it shouldn't factor into your "does it make progress" analysis, because an iteration should continue, not retry. Also, I am not aware of any user of ZERO entries in the page cache. It's possible that somebody has added one without me noticing, but there wasn't one earlier.