Re: [PATCH 3/3] radix-tree: support locking of individual exception entries.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 04 2016, Jan Kara wrote:

> Hi Neil,
>
> On Sun 28-02-16 16:09:29, NeilBrown wrote:
>> The least significant bit of an exception entry is used as a lock flag.
>> A caller can:
>>  - create a locked entry by simply adding an entry with this flag set
>>  - lock an existing entry with radix_tree_lookup_lock().  This may return
>>     NULL if the entry doesn't exists, or was deleted while waiting for
>>     the lock.  It may return a non-exception entry if that is what is
>>     found.  If it returns a locked entry then it has exclusive rights
>>     to delete the entry.
>>  - unlock an entry that is already locked.  This will wake any waiters.
>>  - delete an entry that is locked.  This will wake waiters so that they
>>    return NULL without looking at the slot in the radix tree.
>> 
>> These must all be called with the radix tree locked (i.e. a spinlock held).
>> That spinlock is passed to radix_tree_lookup_lock() so that it can drop
>> the lock while waiting.
>> 
>> This is a "demonstration of concept".  I haven't actually tested, only compiled.
>> A possible use case is for the exception entries used by DAX.
>> 
>> It is possible that some of the lookups can be optimised away in some
>> cases by storing a slot pointer.  I wanted to keep it reasonable
>> simple until it was determined if it might be useful.
>
> Thanks for having a look! So the patch looks like it would do the work but
> frankly the amount of hackiness in it has exceeded my personal threshold...
> several times ;)

Achievement unlocked ? :-)

>
> In particular I don't quite understand why have you decided to re-lookup
> the exceptional entry in the wake function? That seems to be the source of
> a lot of a hackiness?

Yes.....

My original idea was that there would only be a single lookup.  If the
slot was found to be locked, the address of the slot would be stored in
the key so the wakeup function could trivially detect if it was being
deleted, or could claim the lock, and would signal the result to the
caller.

But then I realized that the address of the slot is not necessarily
stable.  In particular the slot for address 0 can be in the root, or it
can be in a node.  I could special-case address zero but it was easier
just to do the search.

Of course the slot disappears if the entry is deleted.  That is why the
wakeup function (which is called under the tree lock so can still safely
inspect the slot) would signal to the caller that the delete had
happened.

So the patch was a little confused....

>                       I was hoping for something simpler like what I've
> attached (compile tested only). What do you think?

Certainly simpler.

By not layering on top of wait_bit_key, you've precluded the use of the
current page wait_queues for these locks - you need to allocate new wait
queue heads.

If in

> +struct wait_exceptional_entry_queue {
> +	wait_queue_t wait;
> +	struct exceptional_entry_key key;
> +};

you had the exceptional_entry_key first (like wait_bit_queue does) you
would be closer to being able to re-use the queues.

Also I don't think it is safe to use an exclusive wait.  When a slot is
deleted, you need to wake up *all* the waiters.

Thanks,
NeilBrown


>
> To avoid false wakeups and thundering herd issues which my simple version does
> have, we could do something like what I outline in the second patch. Now
> that I look at the result that is closer to your patch, just cleaner IMHO :).
> But I wanted to have it separated to see how much complexity does this
> additional functionality brings...
>
> Now I'm going to have a look how to use this in DAX...
>
> 								Honza

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]