Re: [RFC] RCU Judy array with distributed locking for FS extents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Dave Chinner (2013-06-04 07:54:56)
> On Mon, Jun 03, 2013 at 08:46:01AM -0400, Mathieu Desnoyers wrote:
> > * Chris Mason (clmason@xxxxxxxxxxxx) wrote:
> > > Quoting Mathieu Desnoyers (2013-06-03 01:27:58)
> > > > Hi Chris,
> > > > 
> > > > I stumbled the LWN article "A kernel skiplist implementation (Part 1)",
> > > > and really thought I should tell you about the side-project I am
> > > > currently working on: RCU Judy array, with locking distributed within
> > > > the data structure. I developed my own design of this structure
> > > > specifically to be usable with RCU. (ref. my LPC2012 presentation:
> > > > https://www.efficios.com/lpc2012-scaling-rcu-judy-arrays-cache-efficient-compact-fast-and-scalable-trie)
> > > > 
> > > > I think it would fit your extents use-case perfectly, with excellent
> > > > cache usage, constant lookup time, near-linear scalability for lookups,
> > > > and pretty good update scability.
> > > > 
> > > > The code is available in a development branch of the Userspace RCU
> > > > library:
> > > > 
> > > > git://git.lttng.org/userspace-rcu.git branch: urcu/rcuja
> > > > https://git.lttng.org/?p=userspace-rcu.git;a=shortlog;h=refs/heads/urcu/rcuja
> > > > 
> > > > It runs entirely in user-space, although it can be ported easily to the
> > > > Linux kernel. I'd be curious to see how it behaves with your workload.
> > > > 
> > > > Relevant files:
> > > > - rcuja/design.txt: design document of the data structure,
> > > > - urcu/rcuja.h (public API)
> > > > - rcuja/*.[ch]: implementation
> > > > - tests/test_urcu_ja.c: stress-tests, example usage.
> > > > 
> > > > After reading the article on skiplists, I added a new API specifically
> > > > to handle non-overlapping ranges:
> > > > 
> > > > struct cds_hlist_head cds_ja_lookup_lower_equal(struct cds_ja *ja,
> > > >                 uint64_t key);
> > > > 
> > > > AFAIU, "extents", as far as keys are concerned, are nothing more than
> > > > segments, with start and end values.
> > > > 
> > > > Extents can be indexed by the Judy array in the following way: we use
> > > > the start of segment as node key, and keep the end of segment value
> > > > within the leaf node. We add those nodes into the judy array, indexed by
> > > > start-of-segment value. Then, when we want to figure out if a value
> > > > matches a segment, we do the following:
> > > > 
> > > > 1) lookup the possible segment match with cds_ja_lookup_lower_equal(),
> > > >    passing the key to lookup as parameter,
> > > > 
> > > > 2) check that our key fits within the range of the segment using the
> > > >    end-of-segment value within the leaf node.
> > > > 
> > > > Of course, this approach is limited to non-overlapping segments, so I
> > > > hope this is what you need.
> > > 
> > > Hi Mathieu,
> > > 
> > > One problem here is that XFS wants to allow duplicate keys in the tree.
> > > This is possible with some modifications to the skiplist code, but I'm
> > > not sure if it fits into your description above.
> > 
> > Are those segments that completely overlap, or partially overlap ?
> 
> Partial overlap. Can be wholly within an existing exist, or overlap
> start, end or both.

Ouch, ok.  In private email yesterday I talked with Mathieu about how
his current setup can't prevent the concurrent insertion of overlapping
extents.  He does have a plan to address this where the insertion is
synchronized by keeping placeholders in the tree for the free space.  I
think it'll work, but I'm worried about doubling the cost of the insert.

Mathieu, I'm going to spend some time on the kernel side of the skiplist
code filling out the API and reworking the insert code to get rid of my
cursors.  I won't have a chance to move it into userland until at least
next week.

In terms of comparing the two, I'd rather compare in the kernel so we
don't have any surprises.  Are you willing to port the judy code in?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux