On 07/16/2013 02:25 PM, Niels de Vos wrote:
On Mon, Jul 15, 2013 at 01:17:54PM -0400, aakash@xxxxxxxxxxxxxxxxxx wrote:
Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the
specified range. This fop will be useful when a whole file needs to be
initialized with zero (could be useful for zero filled VM disk image
provisioning or during scrubbing of VM disk images).
Client/application can issue this FOP for zeroing out. Gluster server will
zero out required range of bytes ie server offloaded zeroing. In the
absence of
this fop, client/application has to repetitively issue write (zero)
fop to the
server, which is very inefficient method because of the overheads involved in
RPC calls and acknowledgements.
WRITESAME is a SCSI T10 command that takes a block of data as input
and writes
the same data to other blocks and this write is handled completely within the
storage and hence is known as offload . Linux ,now has support for SCSI
WRITESAME command which is exposed to the user in the form of
BLKZEROOUT ioctl.
BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out
operations can be completely offloaded to the storage device ,
making it highly
efficient.
Just wondering (and I think it was mentioned earlier by Vijay already),
why not implement a WRITESAME fop and detect in the storage xlators if
the BLKZEROOUT ioctl() should be used in the case of writing zero's?
Thank you Niels for your comments.
In Linux, we can exploit SCSI WRITESAME using BLKZEROOUT ioctl.
This ioctl issues
WRITESAME ,with zero filled block as input block. So Linux
supports writing only
zeroes using WRITESAME. Also writing zeroes is a very common
operation during
initialization and scrubbing of VM disk images. We have BD Xlator
in GlusterFS for
block devices which can issue this ioctl. Hence instead of a
generic WRITESAME fop
we are adding zerofill fop. I have a patch which makes use of
this ioctl to implement
zerofill in BD xlator. I will be posting it soon.
I'll try to keep an eye open on the merging of this change. Whenever
that happens, we can send a patch to Wireshark so that the new fop gets
detected correctly.
Thanks,
Niels
The fop takes two arguments offset and size. It zeroes out 'size' number of
bytes in an opened file starting from 'offset' position.
This patch adds zerofill support to the following areas:
- libglusterfs
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
Client applications can exloit this fop by using glfs_zerofill introduced in
libgfapi.FUSE support to this fop has not been added as there is no
system call
for this fop.
TODO :
* Add zerofill support to trace xlator
* Expose zerofill capability as part of gluster volume info
Here is a performance comparison of server offloaded zeofill vs zeroing out
using repeated writes.
[root@llmvm02 remote]# time ./offloaded aakash-test log 20
real 3m34.155s
user 0m0.018s
sys 0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20
real 4m23.043s
user 0m2.197s
sys 0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;
real 4m28.363s
user 0m0.021s
sys 0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25
real 5m34.278s
user 0m2.957s
sys 0m18.808s
The argument 'log' is a file which we want to set for logging purpose and the
third argument is size in GB .
As we can see there is a performance improvement of around 20% with
this fop. For
block devices with the use of BLKZEROOUT ioctl, we can improve the
performance even more.
The applications used for performance comparison can be found here:
For manually writing zeros: https://docs.google.com/file/d/0B4jeWncLrfS3LVNybW9lR2dPZkk/edit?usp=sharing
For offloaded zeroing : https://docs.google.com/file/d/0B4jeWncLrfS3LVNybW9lR2dPZkk/edit?usp=sharing
Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
Signed-off-by: Aakash Lal Das <aakash@xxxxxxxxxxxxxxxxxx>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
https://lists.nongnu.org/mailman/listinfo/gluster-devel