Re: [RFC PATCH 0/2] dirreadahead system call

Andreas Dilger <adilger@xxxxxxxxx> · Mon, 10 Nov 2014 15:23:25 -0700

On Nov 9, 2014, at 8:41 PM, Abhijith Das <adas@xxxxxxxxxx> wrote:
>> Hi Dave/all,
>> 
>> I finally got around to playing with the multithreaded userspace readahead
>> idea and the results are quite promising. I tried to mimic what my kernel
>> readahead patch did with this userspace program (userspace_ra.c)
>> Source code here:
>> https://www.dropbox.com/s/am9q26ndoiw1cdr/userspace_ra.c?dl=0
>> 
>> Each thread has an associated buffer into which a chunk of directory
>> entries are read in using getdents(). Each thread then sorts the
>> entries in inode number order (for GFS2, this is also their disk
>> block order) and proceeds to cache in the inodes in that order by
>> issuing open(2) syscalls against them.  In my tests, I backgrounded
>> this program and issued an 'ls -l' on the dir in question. I did the
>> same following the kernel dirreadahead syscall as well.
>> 
>> I did not manage to test out too many parameter combinations for both
>> userspace_ra and SYS_dirreadahead because the test matrix got pretty
>> big and time consuming. However, I did notice that without sorting,
>> userspace_ra did not perform as well in some of my tests. I haven't
>> investigated that, so numbers shown here are all with sorting enabled.

One concern is for filesystems where inode order does not necessarily
match the on-disk order.  I believe that filesystems like ext4 and XFS
have matching inode/disk order, but tree-based COW filesystems like
Btrfs do not necessarily preserve this order, so sorting in userspace
will not help and may in fact hurt readahead compared to readdir order.

What filesystem(s) have you tested this besides GFS?

Cheers, Andreas

>> For a directory with 100000 files,
>> a) simple 'ls -l' took 14m11s
>> b) SYS_dirreadahead + 'ls -l' took 3m9s, and
>> c) userspace_ra (1M buffer/thread, 32 threads) took 1m42s
>> 
>> https://www.dropbox.com/s/85na3hmo3qrtib1/ra_vs_u_ra_vs_ls.jpg?dl=0 is a
>> graph
>> that contains a few more data points. In the graph, along with data for 'ls
>> -l'
>> and SYS_dirreadahead, there are six data series for userspace_ra for each
>> directory size (10K, 100K and 200K files). i.e. u_ra:XXX,YYY, where XXX is
>> one
>> of (64K, 1M) buffer size and YYY is one of (4, 16, 32) threads.
>> 
> 
> Hi,
> 
> Here are some more numbers for larger directories and it seems like
> userspace readahead scales well and is still a good option.
> 
> I've chosen the best-performing runs for kernel readahead and userspace
> readahead. I have data for runs with different parameters (buffer size,
> number of threads, etc) that I can provide, if anybody's interested.
> 
> The numbers here are total elapsed times for the readahead plus 'ls -l'
> operations to complete.
> 
> 							#files in testdir
> 						50k	100k	200k	500k	1m
> ------------------------------------------------------------------------------------
> Readdir 'ls -l'					11	849	1873	5024	10365
> Kernel readahead + 'ls -l' (best case)		7	214	814	2330	4900
> Userspace MT readahead + 'ls -l' (best case)	12	99	239	1351	4761
> 
> Cheers!
> --Abhi

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail