very bad performance on small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>
>
> Sure, and all that applies equally to both NFS and gluster, yet in Max's
> example NFS was ~50x faster than gluster for an identical small-file
> workload. So what's gluster doing over and above what NFS is doing that's
> taking so long, given that network and disk factors are equal? I'd buy a
> factor of 2 for replication, but not 50.
>
>
When using FUSE, the context switch the syscall undergoes even before
glusterfs gets a hand on it is a _huge_ factor, especially when (wrongly)
comparing with local filesystems.


> In case you missed what I'm on about, it was these stats that Max posted:
>
> > Here is the results per command:
> > dd if=/dev/zero of=M/tmp bs=1M count=16384 69.2 MB/se (Native) 69.2
> > MB/sec(FUSE) 52 MB/sec (NFS)
>

This test looks reasonable. Writes seem to be bottlenecked at the sustained
write throughput of the disk itself.


> > dd if=/dev/zero of=M/tmp bs=1K count=163840000  88.1 MB/sec  (Native)
> > 1.1MB/sec (FUSE) 52.4 MB/sec (NFS)
>

The huge drop of FUSE performance compared to NFS is due to the context
switch overhead (which glusterfs cannot to much, as it is the latency coming
in much before glusterfs even comes into the picture). Since both glusterfs
and NFS does caching of writes, the comparison is really ending up being the
latency of the context switch v/s no context switch (disregarding the
network latency completely due to client side caching) - i.e just syscall
delivery to the FS is more expensive in native glusterfs compared to NFS
with no consideration of what each of them do after the syscall has been
delivered.


> > time tar cf - M | pv > /dev/null 15.8 MB/sec (native) 3.48MB/sec
> > (FUSE) 254 Kb/sec (NFS)
>

This test shows why glusterfs native protocol is better than NFS when you
need to scale out storage. Even with a context switch overhead on the client
side, glusterfs scores better due to the "clustered nature" of its protocol.
NFS has to undergo a second hop when it has to fetch data not available in
the server it has mounted from whereas for glusterfs it is always a single
hop to any server it wants to get data from.

In any case comparing to local disk performance and network disk performance
is never right and is always misleading.

Avati


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux