Doesn't such a change increase the "dirty debt" held in the kernel and
therefore increase the probability of resulting in a deadlock (when the
userspace filesystem's memory allocation request ends up waiting on a
dirty page writeout caused by this write-back feature)? Why not
implement the write-back inside the userspace filesystem leaving the
kernel operate in write-through, which makes things overall less unsafe?
Do you have performance numbers comparing write-back in kernel v/s
write-back in userspace?
Avati
On 07/03/2012 08:53 AM, Pavel Emelyanov wrote:
Hi everyone.
One of the problems with the existing FUSE implementation is that it uses the
write-through cache policy which results in performance problems on certain
workloads. E.g. when copying a big file into a FUSE file the cp pushes every
128k to the userspace synchronously. This becomes a problem when the userspace
back-end uses networking for storing the data.
A good solution of this is switching the FUSE page cache into a write-back policy.
With this file data are pushed to the userspace with big chunks (depending on the
dirty memory limits, but this is much more than 128k) which lets the FUSE daemons
handle the size updates in a more efficient manner.
The writeback feature is per-connection and is explicitly configurable at the
init stage (is it worth making it CAP_SOMETHING protected?) When the writeback is
turned ON:
* still copy writeback pages to temporary buffer when sending a writeback request
and finish the page writeback immediately
* make kernel maintain the inode's i_size to avoid frequent i_size synchronization
with the user space
* take NR_WRITEBACK_TEMP into account when makeing balance_dirty_pages decision.
This protects us from having too many dirty pages on FUSE
The provided patchset survives the fsx test. Performance measurements are not yet
all finished, but the mentioned copying of a huge file becomes noticeably faster
even on machines with few RAM and doesn't make the system stuck (the dirty pages
balancer does its work OK). Applies on top of v3.5-rc4.
We are currently exploring this with our own distributed storage implementation
which is heavily oriented on storing big blobs of data with extremely rare meta-data
updates (virtual machines' and containers' disk images). With the existing cache
policy a typical usage scenario -- copying a big VM disk into a cloud -- takes way
too much time to proceed, much longer than if it was simply scp-ed over the same
network. The write-back policy (as I mentioned) noticeably improves this scenario.
Kirill (in Cc) can share more details about the performance and the storage concepts
details if required.
Thanks,
Pavel
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
fuse-devel mailing list
fuse-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/fuse-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html