Re: [patch 2/2 v3]raid5: create multiple threads to handle stripes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm CC'ing Christoph and Dave hoping they may have some insights or
recommendations WRT this md RAID kernel write thread scheduling, and if
use cpusets might be handy or worth considering.

On 3/28/2013 1:47 AM, NeilBrown wrote:
> On Tue, 12 Mar 2013 19:44:19 -0500 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
> wrote:
> 
>> On 3/11/2013 8:39 PM, NeilBrown wrote:
>>> On Thu, 7 Mar 2013 15:31:23 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:
>> ...
>>>>> #echo 1-3 > /sys/block/md0/md/auxth0/cpulist
>>>>> This will bind auxiliary thread 0 to cpu 1-3, and this thread will only handle
>>>>> stripes produced by cpu 1-3. User tool can further change the thread's
>>>>> affinity, but the thread can only handle stripes produced by cpu 1-3 till the
>>>>> sysfs entry is changed again.
>>
>> Would it not be better to use the existing cpusets infrastructure for
>> this, instead of manually binding threads to specific cores or sets of
>> cores?
>>
>> Also, I understand the hot cache issue driving the desire to have a raid
>> thread only process stripes created by its CPU.  But what of the
>> scenario where an HPC user pins application threads to cores and needs
>> all the L1/L2 cache?  Say this user has a dual socket 24 core NUMA
>> system with 2 NUMA nodes per socket, 4 nodes total.  Each NUMA node has
>> 6 cores and shared L3 cache.  The user pins 5 processes to 5 cores in
>> each node, and wants to pin a raid thread to the remaining core in each
>> node to handle the write IO generated by the 5 user threads on the node.
>>
>> Does your patch series allow this?  Using the above example, if the user
>> creates 4 cpusets, can he assign a raid thread to that set and the
>> thread will execute on any core in the set, and only that set, on any
>> stripes created by any CPU in that set, and only that set?
>>
>> The infrastructure for this already exists, has since 2004.  And it
>> seems is more flexible than what you've implemented here.  I suggest we
>> make use of it, as it is the kernel standard for doing such things.
>>
>> See:  http://man7.org/linux/man-pages/man7/cpuset.7.html
>>
>>> Hi Shaohua,
>>>  I still have this sitting in my queue, but I haven't had a chance to look at
>>>  is properly yet - I'm sorry about that.  I'll try to get to it soon.
>>
> 
> Thanks for this feedback.  The interface is the thing I am most concerned
> about getting right at this stage, and is exactly what you are commenting on.

Yeah, that's why I wanted to get these thoughts out there.

> The current code allows you to request N separate raid threads, and to tie
> each one to a subset of processors.  This tying is in two senses.  The
> thread can only run on cpus in the subset, and the requests queued by any
> given processor will preferentially be processed by threads tied to that
> processor.
> 
> It does sound a lot like cpusets could be used instead of lists of CPUs.
> However it does merge the two different cpuset concepts which you seem to
> suggest might not be ideal, and maybe it isn't.

I don't know which method would best.  The Linux Scalability Effort in
the early 2000s put a tremendous amount of work into creating the
cpusets infrastructure to facilitate this type of thing.  I figured it
should be used if applicable/beneficial.  As it's been a standard
interface for quite some time, it should be familiar to many and fairly
well understood.  When I saw Shaohua's description it seemed a little
like reinventing the wheel.

> A completely general solution might be to allow each thread to handle
> requests from one cpuset, and run on any processor in another cpuset.
> Would that be too much flexibility?

I think any<->any would work fine as a default as it would tend to avoid
idle cores.  It may put extra traffic on the NUMA interconnect, but this
is where you would allow the user to tune it via cpusets.

> cpusets are a config option, so we would need to only enable multithreads if
> CONFIG_CPUSETS were set.  Is this unnecessarily restrictive?  

Yes, too restrictive.  I don't think I'd make multithreads dependent on
CONFIG_CPUSETS.  Not every kernel in the wild has CONFIG_CPUSETS enabled
but these machines may indeed have multiple cores.  I have a few such
machines/kernels.

> Are there any
> other cases of kernel threads binding to cpusets?  If there aren't I'd be a
> but cautious of being the first, I as have very little familiarity with this
> stuff.

I do not know if others do kernel thread binding/masking via cpusets or
the underlying kernel interface.  You need feedback from other devs
working with NUMA stuff and kernel thread scheduling.

> I still like the idea of an 'ioctl' which a process can call and will cause
> it to start handling requests.

Agreed.

> The process could bind itself to whatever cpu or cpuset it wanted to, then
> could call the ioctl on the relevant md array, and pass in a bitmap of cpus
> which indicate which requests it wants to be responsible for.  The current
> kernel thread will then only handle requests that no-one else has put their
> hand up for.  This leave all the details of configuration in user-space
> (where I think it belongs).

Agreed.  The cpusets customization will be used by very few people.  But
as a single one of these threads may eat an entire core with some
workloads, I think the option should exist for those who may need it.
This is also pretty much true of cpusets itself, as I'd guess far less
than 1% take advantage of it.  Most servers out there, and
desktops/laptops, are single or dual socket boxen, and most folks
managing such systems have never heard of cpusets.  They simply load up
their apps and let the default kernel config do its thing.

> Shaohua - have you given any thought to that approach.
> 
> If anyone else has any familiarity with multithreading and numa and cpu
> affinity and would like to share some thoughts, I am all ears.

As I said I think you need to get this to a wider audience.  I think
Christoph and Dave might have some valuable insight here, or know others
who do.  CC'ing them.

-- 
Stan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux