Re: Gluster performance on the small files

Ben Turner <bturner@xxxxxxxxxx> · Mon, 16 Feb 2015 17:16:44 -0500 (EST)

----- Original Message -----
> From: "Joe Julian" <joe@xxxxxxxxxxxxxxxx>
> To: "Punit Dambiwal" <hypunit@xxxxxxxxx>, gluster-users@xxxxxxxxxxx, "Humble Devassy Chirammal"
> <humble.devassy@xxxxxxxxx>
> Sent: Monday, February 16, 2015 3:32:31 PM
> Subject: Re:  Gluster performance on the small files
> 
> 
> On 02/12/2015 10:58 PM, Punit Dambiwal wrote:
> 
> 
> 
> Hi,
> 
> I have seen the gluster performance is dead slow on the small files...even i
> am using the SSD....it's too bad performance....even i am getting better
> performance in my SAN with normal SATA disk...
> 
> I am using distributed replicated glusterfs with replica count=2...i have all
> SSD disks on the brick...
> 
> 
> 
> root@vm3:~# dd bs=64k count=4k if=/dev/zero of=test oflag=dsync
> 
> 4096+0 records in
> 
> 4096+0 records out
> 
> 268435456 bytes (268 MB) copied, 57.3145 s, 4.7 MB/s
> 

This seems pretty slow, even if you are using gigabit.  Here is what I get:

[root@gqac031 smallfile]# dd bs=64k count=4k if=/dev/zero of=/gluster-emptyvol/test oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 10.5965 s, 25.3 MB/s

FYI this is on my 2 node pure replica + spinning disks(RAID 6, this is not setup for smallfile workloads.  For smallfile I normally use RAID 10) + 10G.

The single threaded DD process is defiantly a bottle neck here, the power in distributed systems is doing things in parallel across clients / threads.  You may want to try smallfile:

http://www.gluster.org/community/documentation/index.php/Performance_Testing

Smallfile command used - python /small-files/smallfile/smallfile_cli.py --operation create --threads 8 --file-size 64 --files 10000 --top /gluster-emptyvol/ --pause 1000 --host-set "client1, client2"

total threads = 16
total files = 157100
total data =     9.589 GB
 98.19% of requested files processed, minimum is  70.00
41.271602 sec elapsed time
3806.491454 files/sec
3806.491454 IOPS
237.905716 MB/sec

If you wanted to do something similar with DD you could do:

<my script>
for i in `seq 1..4`
do
    dd bs=64k count=4k if=/dev/zero of=/gluster-emptyvol/test$i oflag=dsync &
done
for pid in $(pidof dd); do
    while kill -0 "$pid"; do
        sleep 0.1
    done
done

# time myscript.sh

Then do the math to figure out the MB / sec of the system.

-b 

> 
> 
> root@vm3:~# dd bs=64k count=4k if=/dev/zero of=test conv=fdatasync
> 
> 4096+0 records in
> 
> 4096+0 records out
> 
> 268435456 bytes (268 MB) copied, 1.80093 s, 149 MB/s
> 
> 
> 
> How small is your VM image? The image is the file that GlusterFS is serving,
> not the small files within it. Perhaps the filesystem you're using within
> your VM is inefficient with regard to how it handles disk writes.
> 
> I believe your concept of "small file" performance is misunderstood, as is
> often the case with this phrase. The "small file" issue has to do with the
> overhead of finding and checking the validity of any file, but with a small
> file the percentage of time doing those checks is proportionally greater.
> With your VM image, that file is already open. There are no self-heal checks
> or lookups that are happening in your tests, so that overhead is not the
> problem.
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users