Re: Extremely slow file listing in folders with many files

Artem Russakovskii <archon810@xxxxxxxxx> · Thu, 30 Apr 2020 11:05:19 -0700

I did this on the same prod instance just now.
'find' on a fuse gluster dir with 40k+ files:
1st run: 3m56.261s
2nd run: 0m24.970s
3rd run: 0m24.099s

At this point, I killed all gluster services on one of the 4 servers and verified that that brick went offline.

1st run: 0m38.131s
2nd run: 0m19.369s
3rd run: 0m23.576s

Nothing conclusive really IMO.

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | @ArtemR

On Thu, Apr 30, 2020 at 9:55 AM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
On April 30, 2020 6:27:10 PM GMT+03:00, Artem Russakovskii <archon810@xxxxxxxxx> wrote:

>Hi Strahil, in the original email I included both the times for the

>first

>and subsequent reads on the fuse mounted gluster volume as well as the

>xfs

>filesystem the gluster data resides on (this is the brick, right?).

>

>On Thu, Apr 30, 2020, 7:44 AM Strahil Nikolov <hunter86_bg@xxxxxxxxx>

>wrote:

>

>> On April 30, 2020 4:24:23 AM GMT+03:00, Artem Russakovskii <

>> archon810@xxxxxxxxx> wrote:

>> >Hi all,

>> >

>> >We have 500GB and 10TB 4x1 replicate xfs-based gluster volumes, and

>the

>> >10TB one especially is extremely slow to do certain things with (and

>> >has

>> >been since gluster 3.x when we started). We're currently on 5.13.

>> >

>> >The number of files isn't even what I'd consider that great - under

>> >100k

>> >per dir.

>> >

>> >Here are some numbers to look at:

>> >

>> >On gluster volume in a dir of 45k files:

>> >The first time

>> >

>> >time find | wc -l

>> >45423

>> >real    8m44.819s

>> >user    0m0.459s

>> >sys     0m0.998s

>> >

>> >And again

>> >

>> >time find | wc -l

>> >45423

>> >real    0m34.677s

>> >user    0m0.291s

>> >sys     0m0.754s

>> >

>> >

>> >If I run the same operation on the xfs block device itself:

>> >The first time

>> >

>> >time find | wc -l

>> >45423

>> >real    0m13.514s

>> >user    0m0.144s

>> >sys     0m0.501s

>> >

>> >And again

>> >

>> >time find | wc -l

>> >45423

>> >real    0m0.197s

>> >user    0m0.088s

>> >sys     0m0.106s

>> >

>> >

>> >I'd expect a performance difference here but just as it was several

>> >years

>> >ago when we started with gluster, it's still huge, and simple file

>> >listings

>> >are incredibly slow.

>> >

>> >At the time, the team was looking to do some optimizations, but I'm

>not

>> >sure this has happened.

>> >

>> >What can we do to try to improve performance?

>> >

>> >Thank you.

>> >

>> >

>> >

>> >Some setup values follow.

>> >

>> >xfs_info /mnt/SNIP_block1

>> >meta-data=""              isize=512    agcount=103,

>> >agsize=26214400

>> >blks

>> >         =                       sectsz=512   attr=2, projid32bit=1

>> >      =                       crc=1        finobt=1, sparse=0,

>rmapbt=0

>> >         =                       reflink=0

>> >data     =                       bsize=4096   blocks=2684354560,

>> >imaxpct=25

>> >         =                       sunit=0      swidth=0 blks

>> >naming   =version 2              bsize=4096   ascii-ci=0, ftype=1

>> >log      =internal log           bsize=4096   blocks=51200,

>version=2

>> >        =                       sectsz=512   sunit=0 blks,

>lazy-count=1

>> >realtime =none                   extsz=4096   blocks=0, rtextents=0

>> >

>> >Volume Name: SNIP_data1

>> >Type: Replicate

>> >Volume ID: SNIP

>> >Status: Started

>> >Snapshot Count: 0

>> >Number of Bricks: 1 x 4 = 4

>> >Transport-type: tcp

>> >Bricks:

>> >Brick1: nexus2:/mnt/SNIP_block1/SNIP_data1

>> >Brick2: forge:/mnt/SNIP_block1/SNIP_data1

>> >Brick3: hive:/mnt/SNIP_block1/SNIP_data1

>> >Brick4: citadel:/mnt/SNIP_block1/SNIP_data1

>> >Options Reconfigured:

>> >cluster.quorum-count: 1

>> >cluster.quorum-type: fixed

>> >network.ping-timeout: 5

>> >network.remote-dio: enable

>> >performance.rda-cache-limit: 256MB

>> >performance.readdir-ahead: on

>> >performance.parallel-readdir: on

>> >network.inode-lru-limit: 500000

>> >performance.md-cache-timeout: 600

>> >performance.cache-invalidation: on

>> >performance.stat-prefetch: on

>> >features.cache-invalidation-timeout: 600

>> >features.cache-invalidation: on

>> >cluster.readdir-optimize: on

>> >performance.io-thread-count: 32

>> >server.event-threads: 4

>> >client.event-threads: 4

>> >performance.read-ahead: off

>> >cluster.lookup-optimize: on

>> >performance.cache-size: 1GB

>> >cluster.self-heal-daemon: enable

>> >transport.address-family: inet

>> >nfs.disable: on

>> >performance.client-io-threads: on

>> >cluster.granular-entry-heal: enable

>> >cluster.data-self-heal-algorithm: full

>> >

>> >Sincerely,

>> >Artem

>> >

>> >--

>> >Founder, Android Police <http://www.androidpolice.com>, APK Mirror

>> ><http://www.apkmirror.com/>, Illogical Robot LLC

>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>

>>

>> Hi Artem,

>>

>> Have you checked the same on brick level ? How big is the difference

>?

>>

>> Best Regards,

>> Strahil Nikolov

>>

Hi Artem,

My bad I missed the 'xfs' word... Still the difference  is huge.

May I ask you to do a test again (pure curiosity) as follows:

1. Repeat the test from before

2. Stop 1 brick  and test again.

P.S.: You can try it on the test cluster

Best Regards,

Strahil Nikolov

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users