RE: mempool tracking

Sage Weil <sweil@xxxxxxxxxx> · Tue, 20 Feb 2018 20:19:05 +0000 (UTC)

On Tue, 20 Feb 2018, Allen Samuels wrote:
> This approach will certainly convert the existing atomic 
> increment/decrement into normal increment/decrement.
> 
> If you're willing to tie yourself to pthreads (rather than 
> compiler-based thread_local) you can avoid the leakage of per-thread 
> stat structures through the use the optional destructor in 
> pthread_key_create. This allows you to convert the existing code without 
> locating/modifying all thread creation/destruction sites.

I think I'm actually leaning toward forcing an update to the thread 
create/destroy code or entrypoints to avoid having any conditionals in the 
mempool alloc/dealloc paths.  We'll get a segv if we miss one, but it'll 
be less code in the hot paths.  I think everything still goes through 
Thread.h with the exception of rocksdb, and even that would be easy enough 
to change if there were callbacks or something we were worried about 
(pretty sure there aren't).

Oh... but we also have the librados/librbd/libcephfs code to think about.  
There we don't have the same control.  That might be why we ended up where 
we did!

sage

> 
> (i.e., in "pick_a_shard()" of the current implementation, you need to test:
> 
> (1) has pthread_key_create been invoked once in the process? (You have to do this here, rather than in "main" to handle the case of a global initialization call that allocates something in a mem pool).
> (2) If not, create the key. [Probably don't have to lock this, because global initialization isn't multi-threaded, but "main" ought to do at least one mempool allocation to ensure that it happens before we start multi-threading.....]
> (3) If pthread_getspecific(...) is null, then allocate the per-thread structure for this new thread and set it up.... (will need to lock for your linked list).
> 
> That's pretty much it. Oh, don't forget the destructor :)
> 
> Provided that the # of shards in the existing scheme is sufficiently high (so as to make collisions rare), then you ought to see about the same performance. However, since the existing sharding scheme isn't dynamic (i.e., same threads always get same shards, meaning that same collisions repeat themselves), it will eliminate the odd persistent collision that will happen regardless.
> 
> I think it's a good idea and shouldn't be more than a few minutes of coding.
> 
> Allen Samuels  
> R&D Engineering Fellow 
> 
> Western Digital(r) 
> Email:  allen.samuels@xxxxxxx 
> Office:  +1-408-801-7030
> Mobile: +1-408-780-6416 
> 
> 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> > Sent: Tuesday, February 20, 2018 6:55 AM
> > To: Allen Samuels <Allen.Samuels@xxxxxxx>
> > Cc: ceph-devel@xxxxxxxxxxxxxxx
> > Subject: mempool tracking
> > 
> > Hi Allen,
> > 
> > I was thinking a bit about the mempool tracking implementation and if there
> > was a reason we couldn't do something simpler with thread local variables.
> > Maybe this came up before, but I'm not remembering.  I'm thinking
> > something like
> > 
> > struct per_thread_stats {
> >   int64_t num_objects, num_bytes;
> >   ...
> >   per_thread_stats *next;
> > };
> > 
> > thread_local struct per_thread_stats g_per_thread = nullptr; struct
> > per_thread_stats *g_all_threads = nullptr; ceph_spinlock_t
> > g_thread_list_spinlock;
> > 
> > - The alloc/dealloc calls can just update the g_per_thread struct directly
> > (don't even need to use atomics, probably?).
> > 
> > - The method to read stats can walk the g_all_threads linked list - on thread
> > create/exit we allocate the thread local struct and add it to the list.
> > 
> > - spinlock would protect linked list updates.
> > 
> > The main downside I see to this approach is that we kind of need to keep
> > using the common/Thread wrapper to ensure the init/teardown happens.
> > Anyone using std::thread would need to do it explicitly.
> > 
> > (Or, the mempool stuff could have a check for a null value and, if so, allocate
> > it dynamically. That would leave orphan items around for threads that got
> > destroyed, though.)
> > 
> > Anyway, this seems simpler to the hashing we do now.  It would also
> > degenerate to a simple non-atomic inc/dec for the seastar threads, which
> > would be nice.
> > 
> > Am I missing something?
> > 
> > sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html