[RFC 0/7] first draft of ZUFS - the Kernel part

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all

I would please like to present the ZUFS file system and the Kernel code part in this
patchset.

The Kernel code presented here can be found at:
	https://github.com/NetApp/zufs-zuf

And for a very crud User-mode Server:
	https://github.com/NetApp/zufs-zus

ZUFS - stands for Zero-copy User-mode FS
- It is geared towards true zero copy end to end of both data and meta data.
- It is geared towards very *low latency*, very high CPU locality, lock-less parallelism.
- Synchronous operations
- Numa awareness

Short description:
  ZUFS is a from scratch implementation of a filesystem-in-user-space, which tries to address the above
goals. from the get go it is aimed for pmem based FSs. But can easily support other type of FSs that
can utilize x10 latency and parallelism improvements. The novelty of this project is that the interface
is designed with a modern multi-core NUMA machine in mind down to the ABI, so to reach these goals.

Not only FSs need apply, also any kind of user-mode Server can set up a pseudo filesystem and communicate
with application via virtual files. These can then benefit from zero copy low-latency communication
directly to/from application buffers. Or Application mmap direct Server resources. As long as it looks like
a file system to the Kernel.

Current status is that we have couple trivial filesystem implementations and together with the Kernel module
the UM-Server and the FSs User-mode pluggin can actually pass a good bunch of xfstests quick run. (Still working on Stability)

Just to get some points across as I said this project is all about performance and low latency.
Here below are a POC results I have run:

	In Kernel pmem-FS			ZUFS			FUSE	
Threads	Op/s	Lat (us)		Op/s	Lat [us]	Op/s	Lat [us]
1	388361	2.271589		200799	4.6		71820	13.5
2	635115	2.604376		314321	5.9		148083	13.1
4	1260307	2.626361		565574	6.6		212133	18.3
8	2744963	2.485292		1113138	6.6		209799	37.6
12	2126945	5.020506		1598451	6.8		201689	58.7
18	4350995	3.386433		1648689	7.8		174823	101.8
24	4211180	4.784997		1702285	8		149413	159
36	3057166	9.291997		1783346	13.4		148276	240.7
48	3148972	10.382461		1741873	17.4		145296	327.3

I have used an average server machine in our lab with two NUMA nodes and total of 40 cores (Can't remember all the details).
Running fio with 4k random writes. The IO is then just memcpy_nt() to a pmem simulated DRAM. The fio was run with more and
more threads (see threads column)

We can see that we are still > x2 slower than in-Kernel FS. But I believe I can shave off another 1 us by optimizing the
app-to-server thread switch by utilizing perhaps the "Binder" scheduler object or devising another way to not be going
through the scheduler (and its locks) when switching VMs

BE-CAREFUL: This is a big code dump. And very much an RFC. Not yet very stable, not yet cleaned up.
I have sliced the FS to 4 very big patches. Please talk to me if I should split it up to many more
patches.

[I am afraid to send such huge emails so I'm posting web links instead. What is the mailing-list
 message limit anyone knows?

 For first version I'm sending web links to github of the 4 HUGE patches.
 Please clone above trees if you want to play with this.

 Please tell me if you want to be removed from CC of these emails
]

list of patches:
[RFC 1/7] mm: Add new vma flag VM_LOCAL_CPU

   This is a small but very important patch to the mmap code to support
   above scores. See inside the before and after results.
   (And is unfinished)

[RFC 2/7] fs: Add the ZUF filesystem to the build + License
   Add the makefile and Kconfig + the licensing of the code

[RFC 3/7] zuf: Preliminary Documentation
   Very unfinished Start of a Documentation for this project, Overall
   concept. Kernel side and usermode Server side.
   Please help me in asking what you need more in here. Any questions
   I will try to add here

[RFC 4/7] zuf: zuf-rootfs && zuf-core
[RFC 5/7] zus: Devices && mounting
[RFC 6/7] zuf: Filesystem operations
[RFC 7/7] zuf: Write/Read && mmap implementation

  After these 4 HUGE patches. There is a working live system. Still bugs
  and corner cases. But it can git clone and make a Kernel which is for
  me not so bad.

Thank you
Boaz



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux