[PATCH 0/2] Introducing the eXtensible Array (xarray)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>

I wrote the xarray to replace the radix tree with a better API based
on observing how programmers are currently using the radix tree, and
on how (and why) they aren't.  Conceptually, an xarray is an array of
2^BITS_PER_LONG pointers which is initially full of NULL pointers.

This is the result of about two weeks worth of hacking.  There is much
more to do; I think the most important thing remaining is to start
working on converting the page cache from the radix tree to the xarray.
That will tell me where the API could be better.  I'm sending it out
now for feedback; I know there are bugs and some half-finished code,
but I'm interested in comments on the design direction.

Improvements the xarray has over the radix tree:

 - The radix tree provides operations like other trees do; 'insert' and
   'delete'.  But what users really want is an automatically resizing
   array, and so it makes more sense to give users an API that is like
   an array -- 'load' and 'store'.
 - Locking is part of the API.  This simplifies a lot of users who
   formerly had to manage their own locking just for the radix tree.
   It also improves code generation as we can now tell RCU that we're
   holding a lock and it doesn't need to generate as much fencing code.
   The other advantage is that tree nodes can be moved (not yet
   implemented).
 - GFP flags are now parameters to calls which may need to allocate
   memory.  The radix tree forced users to decide what the allocation
   flags would be at creation time.  It's much clearer to specify them
   at allocation time.  I know the MM people disapprove of the radix
   tree using the top bits of the GFP flags for its own purpose, so
   they'll like this aspect.
 - Memory is not preloaded; we don't tie up dozens of pages on the
   off chance that the slab allocator fails.  Instead, we drop the lock,
   allocate a new node and retry the operation.
 - The xarray provides a conditional-replace operation.  The radix tree
   forces users to roll their own (and at least four have).
 - Iterators now take a 'max' parameter.  That simplifies many users and
   will reduce the amount of iteration done.
 - Iteration can proceed backwards.  We only have one user for this, but
   since it's called as part of the pagefault readahead algorithm, that
   seemed worth mentioning.  Not fully implemented.
 - RCU-protected pointers are not exposed as part of the API.  There are
   some fun bugs where the page cache forgets to use rcu_dereference()
   in the current codebase.
 - Any function which wants it can now call the update_node() callback.
   There were a few places missing that I noticed as part of this rewrite.
 - Help the page cache to keep nrexceptional and nrpages up to date.
 - Exceptional entries may now be BITS_PER_LONG-1 in size, rather than the
   BITS_PER_LONG-2 that they had in the radix tree.  That gives us the
   extra bit we need to put huge page swap entries in the page cache.

The API comes in two parts, normal and advanced.  The normal API takes
care of the locking and memory allocation for you.  You can get the
value of a pointer by calling xa_load() and set the value of a pointer by
calling xa_store().  You can conditionally update the value of a pointer
by calling xa_replace().  Each pointer which isn't NULL can be tagged
with up to 3 bits of extra information, accessed through xa_get_tag(),
xa_set_tag() and xa_clear_tag().  You can copy batches of pointers out
of the array by calling xa_get_entries() or xa_get_tagged().  You can
iterate over pointers in the array by calling xa_find(), xa_next()
or xa_for_each().

The advanced API allows users to build their own operations.  You have
to take care of your own locking and handle memory allocation failures.
Most of the advanced operations are based around the xa_state which
keeps state between sub-operations.  Read the xarray.h header file for
more information on the advanced API, and see the implementation of the
normal API for examples of how to use the advanced API.

Those familiar with the radix tree may notice certain similarities between
the implementation of the xarray and the radix tree.  That's entirely
intentional, but the implementation will certainly adapt in the future.
For example, one of the impediments I see to using xarrays instead of
kvmalloced arrays is memory consumption, so I have a couple of ideas to
reduce memory usage for smaller arrays.

I have already reimplementated the IDR and the IDA based on the xarray.
They are roughly the same complexity as they were when implemented on
top of the radix tree (although much less intertwined).

When converting code from the radix tree to the xarray, the biggest thing
to bear in mind is that 'store' overwrites anything which happens to be
in the xarray.  Just like the assignment operator.  The equivalent to
the insert operation is to replace NULL with the new value.

A quick reference guide to help when converting radix tree code.
Functions which start 'xas' are XA_ADVANCED functions.

radix_tree_empty			xa_empty
__radix_tree_create			xas_create
__radix_tree_insert			xas_store
radix_tree_insert(x)			xa_replace(NULL, x)
__radix_tree_lookup			xas_load
radix_tree_lookup			xa_load
radix_tree_lookup_slot			xas_load
__radix_tree_replace			xas_store
radix_tree_iter_replace			xas_store
radix_tree_replace_slot			xas_store
__radix_tree_delete_node		xas_delete_node
radix_tree_delete_item			xa_replace
radix_tree_delete			xa_store(NULL)
radix_tree_clear_tags			xas_clear_tags
radix_tree_gang_lookup			xa_get_entries
radix_tree_gang_lookup_slot		xas_find (*1)
radix_tree_preload			(*3)
radix_tree_maybe_preload		(*3)
radix_tree_tag_set			xa_set_tag
radix_tree_tag_clear			xa_clear_tag
radix_tree_tag_get			xa_get_tag
radix_tree_iter_tag_set			xas_set_tag
radix_tree_gang_lookup_tag		xa_get_tagged
radix_tree_gang_lookup_tag_slot		xas_load (*2)
radix_tree_tagged			xa_tagged
radix_tree_preload_end			xas_preload_end
radix_tree_split_preload		(*3)
radix_tree_split			xas_split (*4)
radix_tree_join				xa_replace

(*1) All three users of radix_tree_gang_lookup_slot() are using it to
ensure that there are no entries in a given range.  
(*2) The one radix_tree_gang_lookup_tag_slot user should be using a
radix_tree_iter loop.  It can use an xas_for_each() loop, or even an
xa_for_each() loop.
(*3) I don't think we're going to need a preallocation API.  If we do
end up needing one, I have a plan that doesn't involve per-cpu
preallocation pools.
(*4) Not yet implemented


Matthew Wilcox (2):
  Add XArray
  XArray: Convert IDR and add test suite

 include/linux/idr.h                            |  136 +--
 include/linux/xarray.h                         |  764 ++++++++++++++
 lib/Makefile                                   |    3 +-
 lib/idr.c                                      |  503 +++++----
 lib/radix-tree.c                               |  169 +--
 lib/xarray.c                                   | 1305 ++++++++++++++++++++++++
 tools/include/asm/bug.h                        |    4 +
 tools/include/linux/kernel.h                   |    1 +
 tools/include/linux/spinlock.h                 |    7 +-
 tools/testing/radix-tree/.gitignore            |    4 +-
 tools/testing/radix-tree/Makefile              |   34 +-
 tools/testing/radix-tree/{idr-test.c => idr.c} |   37 +-
 tools/testing/radix-tree/linux.c               |    2 +-
 tools/testing/radix-tree/linux/radix-tree.h    |    5 -
 tools/testing/radix-tree/linux/rcupdate.h      |    2 +
 tools/testing/radix-tree/linux/xarray.h        |    1 +
 tools/testing/radix-tree/test.c                |   59 +-
 tools/testing/radix-tree/test.h                |   48 +-
 tools/testing/radix-tree/xarray.c              |  241 +++++
 19 files changed, 2783 insertions(+), 542 deletions(-)
 create mode 100644 include/linux/xarray.h
 create mode 100644 lib/xarray.c
 rename tools/testing/radix-tree/{idr-test.c => idr.c} (91%)
 create mode 100644 tools/testing/radix-tree/linux/xarray.h
 create mode 100644 tools/testing/radix-tree/xarray.c

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux