fyi: we have 3 servers, each with 2 SW RAID10 used as bricks in a replicate 3 setup (so 2 volumes); the default values set by OS (debian stretch) are: /dev/md3 Array Size : 29298911232 (27941.62 GiB 30002.09 GB) /sys/block/md3/queue/read_ahead_kb : 3027 /dev/md4 Array Size : 19532607488 (18627.75 GiB 20001.39 GB) /sys/block/md4/queue/read_ahead_kb : 2048 maybe that helps somehow :) Hubert Am Mi., 13. Feb. 2019 um 06:46 Uhr schrieb Manoj Pillai <mpillai@xxxxxxxxxx>: > > > > On Wed, Feb 13, 2019 at 10:51 AM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote: >> >> >> >> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote: >>> >>> All, >>> >>> We've found perf xlators io-cache and read-ahead not adding any performance improvement. At best read-ahead is redundant due to kernel read-ahead >> >> >> One thing we are still figuring out is whether kernel read-ahead is tunable. From what we've explored, it _looks_ like (may not be entirely correct), ra is capped at 128KB. If that's the case, I am interested in few things: >> * Are there any realworld applications/usecases, which would benefit from larger read-ahead (Manoj says block devices can do ra of 4MB)? > > > kernel read-ahead is adaptive but influenced by the read-ahead setting on the block device (/sys/block/<dev>/queue/read_ahead_kb), which can be tuned. For RHEL specifically, the default is 128KB (last I checked) but the default RHEL tuned-profile, throughput-performance, bumps that up to 4MB. It should be fairly easy to rig up a test where 4MB read-ahead on the block device gives better performance than 128KB read-ahead. > > -- Manoj > >> * Is the limit on kernel ra tunable a hard one? IOW, what does it take to make it to do higher ra? If its difficult, can glusterfs read-ahead provide the expected performance improvement for these applications that would benefit from aggressive ra (as glusterfs can support larger ra sizes)? >> >> I am still inclined to prefer kernel ra as I think its more intelligent and can identify more sequential patterns than Glusterfs read-ahead [1][2]. >> [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf >> [2] https://lwn.net/Articles/155510/ >> >>> and at worst io-cache is degrading the performance for workloads that doesn't involve re-read. Given that VFS already have both these functionalities, I am proposing to have these two translators turned off by default for native fuse mounts. >>> >>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have these xlators on by having custom profiles. Comments? >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029 >>> >>> regards, >>> Raghavendra > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users