Hi Miklos, > Just a heads up, that I have achieved similar results with a prototype > using the unmodified fuse protocol. This prototype was built with > ideas taken from zufs (percpu/lockless, mmaped dev, single syscall per > op). I found a big scheduler scalability bottleneck that is caused by > update of mm->cpu_bitmap at context switch. This can be worked > around by using shared memory instead of shared page tables, which is > a bit of a pain, but it does prove the point. Thought about fixing > the cpu_bitmap cacheline pingpong, but didn't really get anywhere. > > Are you interested in comparing zufs with the scalable fuse prototype? > If so, I'll push the code into a public repo with some instructions, I would be happy to help here (review, lightly test and debug). I wanted to give the ioctl threads method a try for some time already just never came to it yet. Thanks, Bernd