Re: [PATCH 1/4] secret: Return with locked obj from virSecretObjListRemove

John Ferlan <jferlan@xxxxxxxxxx> · Fri, 23 Mar 2018 09:32:21 -0400

On 03/22/2018 05:44 AM, Erik Skultety wrote:
> On Wed, Mar 21, 2018 at 11:53:32AM -0400, John Ferlan wrote:
>> Rather than unlock the object that was expected to be locked on
>> input, let the caller perform the unlock or more succinctly a
>> virSecretObjEndAPI on the object which will perform the Unref
>> and Unlock and clear the @obj.
> 
> I understand the incentive for this change and I agree with it, but I do have a
> question below.
> 
>>
>> Also clean up the virSecretObjListRemove function comments.
>>
>> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx>
>> ---
>>  src/conf/virsecretobj.c    | 15 ++++++++-------
>>  src/secret/secret_driver.c |  4 ----
>>  2 files changed, 8 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/conf/virsecretobj.c b/src/conf/virsecretobj.c
>> index 47e0b28968..49c341484b 100644
>> --- a/src/conf/virsecretobj.c
>> +++ b/src/conf/virsecretobj.c
>> @@ -284,11 +284,14 @@ virSecretObjListFindByUsage(virSecretObjListPtr secrets,
>>  /*
>>   * virSecretObjListRemove:
>>   * @secrets: list of secret objects
>> - * @secret: a secret object
>> + * @obj: a secret object
>>   *
>> - * Remove the object from the hash table.  The caller must hold the lock
>> - * on the driver owning @secrets and must have also locked @secret to
>> - * ensure no one else is either waiting for @secret or still using it.
>> + * Remove @obj from the secret obj list hash table. The caller must hold
>> + * the lock on @obj to ensure no one else is either waiting for @obj or
>> + * still using it.
>> + *
>> + * Upon return the @obj remains locked with at least 1 reference and
>> + * the caller is expected to use virSecretObjEndAPI on it.
>>   */
>>  void
>>  virSecretObjListRemove(virSecretObjListPtr secrets,
>> @@ -308,7 +311,6 @@ virSecretObjListRemove(virSecretObjListPtr secrets,
>>      virObjectRWLockWrite(secrets);
>>      virObjectLock(obj);
>>      virHashRemoveEntry(secrets->objs, uuidstr);
>> -    virObjectUnlock(obj);
>>      virObjectUnref(obj);
>>      virObjectRWUnlock(secrets);
> 
> So, why do we require the caller to hold a lock on the object when the very
> first thing we do is that we ref the @obj, UNLOCK it to avoid a deadlock with
> someone else already holding a lock on @secrets and waiting on @obj and only
> when we have the lock on @secrets, we re-lock @obj. Why isn't it sufficient to
> just ref @obj since that's an atomic op and then only lock it once we acquire a
> lock on @secrets too. In this case, you don't have to accept and return a locked
> object, you only lock when absolutely necessary, what am I missing? In my
> opinion this lock-unlock-lock (with occasional ref) fire dance is quite
> cumbersome, but I guess I might be missing some important aspect here which I
> don't see at the first glance.
> 

I have no compelling argument other than history (it's how it's been
done) and a desire to not change too much code, but I can take that path
to see what happens.

Ironically as I was making these changes I thought - why even do that
ref since we enter with the @obj reffed and locked. Adding one more ref
doesn't help or hurt when it comes to unlock as we enter (this code)
with 2 refs and 1 lock. We RemoveEntry losing 1 ref and then have the
caller unref when it's done.  The lock fire dance is only because of
lock ordering.

I think part of the issue is that there's mostly error path callers to
ListRemove and one non error path caller (Undefine). For each of those
callers, we grab the list lock, fetch the object, and return it
locked/reffed - changing that would have the same fire dance problem you
note in the current Remove path, but perhaps for larger swaths of code.

Another part of the issue is that we don't grab the List lock in the
driver, rather we grab it in the vir*obj.c code. That's a design
decision propagated from the domain/network change to use hash tables
that I just followed when altering secrets, interface, nodedev, storage,
nwfilter, etc.

So in order to not have the fire dance, the list locking would need to
move to the driver and would be held for longer periods which is what
IIRC we're trying to avoid originally (think domain code and how many
threads can hammer that). Of course with rwlock/read locks that perhaps
is now a non issue.

NB: consider the possibility that two threads try to Undefine at the
same time... One wins the list lock race and does the hash table
removal, but since there's still a refcnt from the other thread, the
memory isn't freed. When the second thread gets the lock, it'll call the
RemoveEntry (and fail, but not error). Eventually it unrefs the last ref
and poof the memory is gone.

>>  }
>> @@ -927,8 +929,7 @@ virSecretLoad(virSecretObjListPtr secrets,
>>
>>      if (virSecretLoadValue(obj) < 0) {
>>          virSecretObjListRemove(secrets, obj);
>> -        virObjectUnref(obj);
>> -        obj = NULL;
>> +        virSecretObjEndAPI(&obj); /* clears obj */
> 
> This can be further simplified by moving the ObjEndAPI to the cleanup section
> and removing the braces on this 'if' block.
> 

True, that'd be an extra "before" patch I think though.

John

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list