bluestore locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After our conversation this mornning I went through the locking a bit more 
and I think we're in okay shape.  To summarize:

These 'forward lookup' structures are protected by coll->lock:

 Bnode::blob_map -> Blob                                    coll->lock
 Onode::blob_map -> Blob                                    coll->lock

These ones are protected by cache->lock:

 Collection::OnodeSpace::onode_map -> Onode (unordered_map)     cache->lock
 Blob::bc -> Buffer                                             cache->lock

The BnodeSet is a bit different because it is depopulated when the last 
onode ref goes away.  But it has its own lock:

 Collection::BnodeSet::uset -> Bnode (intrustive set)       BnodeSet::lock

Anyway, the point of this is that the cache trim() can do everything it 
needs with just cache->lock.  That means that during an update, we 
normally have coll->lock to protect the structures we're touching, and if 
we are adding onodes to OnodeSpace or BufferSpace we additionally take 
cache->lock for the appropriate cache fragment.

We were getting tripped up from the blob_map iteration in _txc_state_proc 
because we were holding no locks at all (or, previously, a collection lock 
that may or may not be the right one).  Igor's PR fixes this by 
making Blob refcounted and keeping a list of these.  The finish_write() 
function takes cache->lock as needed.  Also, it's worth pointing out that 
the blobs that we're touching will all exist under an Onode that is in the 
onodes list, and it turns out that the trim() is already doing the right 
thing and not trimming Onodes that still have any refs.

Which leaves me a bit confused as to how we originally were 
crashing, because we were taking the first_collection lock.  My guess is 
that first_collection was not the right collection and a racing update was 
updating the Onode.  The refcounting alone sorts this out.  My other fix 
would have also resolved it by taking the correct collection's 
lock, I think.  Unless there is another locking problem I'm not seeing.. 
but I think what is in master now has it covered.

Igor, Somnath, does the current strategy make sense?

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux