On Wed, Mar 24, 2010 at 8:51 AM, Chris Mason <chris.mason@xxxxxxxxxx> wrote: > On Wed, Mar 24, 2010 at 07:53:10AM -0700, Dan Williams wrote: >> The current implementation with the async thread pool ends up spreading >> the work over too many threads. The btrfs workqueue is targeted at high >> cpu utilization works and has a threshold mechanism to limit thread >> spawning. Unfortunately it still ends up increasing cpu utilization >> without a comparable improvement in throughput. Here are the numbers >> relative to the multicore disabled case: >> >> idle_thresh throughput cycles >> 4 +0% +102% >> 64 +4% +63% >> 128 +1% +45% > > Interesting, do the btrfs workqueues improve things? Or do you think they are > just a better base for more tuning? Both, throughput falls off a cliff with the async thread pool, and there are more knobs to turn in this implementation. > I had always hoped to find more users for the work queues and tried to > keep btrfs specific features out of them. The place I didn't entirely > succeed was in the spin locks, the ordered queues take regular spin > locks to avoid turning off irqs where btrfs always does things outside > of interrupt time. Doesn't look like raid needs the ordered queues so > this should work pretty well. > >> >> This appears to show that something more fundamental needs to happen to >> take advantage of percpu raid processing. More profiling is needed, but >> the suspects in my mind are conf->device_lock contention and the fact >> that all work is serialized through conf->handle_list with no method for >> encouraging stripe_head to thread affinity. > > The big place I'd suggest to look inside the btrfs async-thread.c for > optimization is the worker_loop(). For work that tends to be bursty and > relatively short, we can have worker threads finish their work fairly > quickly and go to sleep, only to be woken up very quickly again with > another big batch of work. The worker_loop code tries to wait around > for a while, but the tuning here was btrfs specific. > > It might also help to tune the find_worker and next_worker code to prefer > giving work to threads that are running but almost done with their > queue. Maybe they can be put onto a special hot list as they get near > the end of their queue. > Thanks I'll take a look at these suggestions. For these optimizations to have a chance I think we need stripes to maintain affinity with the first core that picks up the work. Currently all stripes take a trip through the single-threaded raid5d when their reference count drops to zero, only to be immediately reissued to the thread pool potentially on a different core (but I need to back this assumption up with more profiling). > There's a rule somewhere that patches renaming things must have replies > questioning the new name. The first reply isn't actually allowed to > suggest a better name, which is good because I'm not very good at > that kind of thing. > > Really though, btr_queue is fine by me, but don't feel obligated to keep > some variation of btrfs in the name ;) btr_queue seemed to make sense since it's spreading work like "butter" :-). -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html