Re: [fuse-devel] FUSE: fixes to improve scalability on NUMA systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Miklos,

thanks a lot for your quick response :)

Miklos Szeredi wrote:
On Tue, Apr 30, 2013 at 8:17 AM, Srinivas Eeda <srinivas.eeda@xxxxxxxxxx> wrote:
Why just NUMA? For example see this discussion a while back:
The reason I targeted NUMA is because NUMA machine is where I am seeing significant performance issues. Even on a NUMA system if I bind all user threads to a particular NUMA node, there is no notable performance issue. The test I ran was to start multiple(from 4 to 128) "dd if=/dev/zero of=/dbfsfilesxx bs=1M count=4000" on a system which has 8 NUMA nodes where each node has 20 cores. So total cpu's were 160.
http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/11832/
That was a good discussion. The problem discussed here is much more fine grained than mine. Fix I emailed, proposes to bind requests to within a NUMA node vs the above discussion that proposes to bind requests to within cpu. Based on your agreement with Anand Avati I think you prefer to bind requests to cpu.

http://article.gmane.org/gmane.comp.file-systems.fuse.devel/11909

Patch I proposed can easily be modified to do that. With my current system in mind, currently my patch will split each queue to 8 (on 8 node numa). With the change each queue will be split to 160. Currently my libfuse fix will start 8 threads and bind one to each NUMA node, now it will have to start 160 and bind them to cpus. If you prefer to see some numbers I can modify the patch and run some tests.

Chances of processes migrating to different NUMA node is minimum. So I didn't modify fuse header to carry a queue id. In the worst case where the worker thread gets migrated to different NUMA node my fix will scan all split queues till it find the request. But if we split the queues to per cpu, there is a high chance that processes migrate to different cpu's. So I think it will benefit that I add cpuid to the fuse in/out headers.

We should be improving scalability in small steps, each of which makes
sense and improves the situation.   Marking half the fuse_conn
structure per-cpu or per-node is too large and is probably not even be
the best step.

For example we have various counters protected by fc->lock that could
be done with per-cpu counters.  Similarly, we could have per-cpu lists
for requests, balancing requests only when necessary.  After that we
could add some heuristics to discourage balancing between numa nodes.

To sum up: improving scalability for fuse would be nice, but don't
just do it for NUMA and don't do it in one big step.

Thanks,
Miklos

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux