Re: Confusing lstat() performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Niklas,

Out of interest have you tried testing performance with performance.stat-prefetch enabled?

--
Sam McLeod 
@s_mcleod
https://smcleod.net

On 14 Sep 2017, at 10:42 pm, Niklas Hambüchen <mail@xxxxxx> wrote:

Hi,

I have a gluster 3.10 volume with a dir with ~1 million small files in
them, say mounted at /mnt/dir with FUSE, and I'm observing something weird:

When I list and stat them all using rsync, then the lstat() calls that
rsync does are incredibly fast (23 microseconds per call on average,
definitely faster than a network roundtrip between my 3-machine bricks
connected via Ethernet).
But when I try to back them up with the `bup` tool
(https://github.com/bup/bup), which (at least according to strace) does
the same syscalls as rsync to stat all files, it takes 1700 microseconds
per lstat() call, and the total time to go over all files is 50x higher.

These are my strace statistics:

rsync strace:

 $ strace -f -w -c rsync -a --dry-run /mnt/dir /tmp/nonexisting

 % time     seconds  usecs/call     calls    errors syscall
 ------ ----------- ----------- --------- --------- ----------------
  52.35   69.682773       26617      2618         1 select
  39.60   52.712292        1056     49907           getdents
   8.00   10.655949          11    998102           lstat
   0.02    0.022890          12      1900           read
   0.01    0.017219          21       829           write
   0.01    0.012093          14       868           munmap
   0.01    0.006656          11       606           mmap
   0.00    0.004168        1389         3           readlink
   0.00    0.001796        1796         1           chdir
   0.00    0.001019         510         2           clone
   0.00    0.000841          44        19        13 open

Took ~50 seconds real time to complete.

bup strace (I interrupted it after a while):

 strace: Process 10749 attached
 strace: Process 10750 attached
 Indexing: 25600 (566 paths/s)
 ^C
 % time     seconds  usecs/call     calls    errors syscall
 ------ ----------- ----------- --------- --------- ----------------
  89.55 1140.837325        1700    671016           lstat
   3.92   49.875376         934     53387           getdents
   3.58   45.655985       52238       874           read
   2.14   27.293944         789     34588      5799 open
   0.57    7.266342         141     51384           llistxattr
   0.09    1.090689          42     25692     25692 getxattr
   0.06    0.780977          26     29825      1019 close
   0.05    0.601806          23     25739     25722 ioctl
   0.03    0.373894        1851       202           select
   0.00    0.055953          14      3879           brk

 real    20m52.150s
 user    0m2.608s
 sys     0m11.456s

Note I passed `-c -w` to strace to measure wall time of the syscalls
spend, not system CPU time.

Using

 time strace -f -c -w ls -lU /mnt/dir > /dev/null

shows that ls is as fast as rsync (also in 50 seconds).

(Aside: I've filed the large number of getdents() as
https://bugzilla.redhat.com/show_bug.cgi?id=1478411 but that's not the
core question of this email.)

My volume info:

   # gluster vol info

   Volume Name: myvol
   Type: Replicate
   Volume ID: ...
   Status: Started
   Snapshot Count: 0
   Number of Bricks: 1 x 3 = 3
   Transport-type: tcp
   Bricks:
   Brick1: 10.0.0.1:/data/glusterfs/brick
   Brick2: 10.0.0.2:/data/glusterfs/brick
   Brick3: 10.0.0.3:/data/glusterfs/brick
   Options Reconfigured:
   changelog.capture-del-path: on
   changelog.changelog: on
   storage.build-pgfid: on
   nfs.disable: on
   transport.address-family: inet
   client.ssl: on
   server.ssl: on
   storage.linux-aio: on
   performance.io-thread-count: 64
   performance.readdir-ahead: on
   server.event-threads: 32
   client.event-threads: 32
   server.outstanding-rpc-limit: 64
   cluster.lookup-unhashed: auto
   performance.flush-behind: on
   performance.strict-write-ordering: off
   performance.high-prio-threads: 64
   performance.normal-prio-threads: 64
   performance.low-prio-threads: 64
   performance.write-behind-window-size: 10MB
   cluster.ensure-durability: on
   performance.lazy-open: yes
   cluster.use-compound-fops: off
   performance.open-behind: on
   features.cache-invalidation: off
   performance.quick-read: off
   performance.read-ahead: off
   performance.stat-prefetch: off
   changelog.rollover-time: 1
   cluster.self-heal-daemon: enable

Questions:

What could explain why bup's lstat()s are slow and rsync's lstat()s are
fast?

Also, how comes rsync's lstat()s can be faster than a network roundtrip
at all, given that I have all caching disabled (e.g. stat-prefetch: off,
cache-invalidation: off)?

Is there some caching going on? Might bup issue its syscalls in an order
that might destroy this caching, while rsync and ls use a favourable order?

Thanks!
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux