On Wed, May 25, 2016 at 03:11:55PM -0400, neha agarwal wrote: > Hi All, > > I have been testing Hugh's and Kirill's huge tmpfs patch sets with > Cassandra (NoSQL database). I am seeing significant performance gap between > these two implementations (~30%). Hugh's implementation performs better > than Kirill's implementation. I am surprised why I am seeing this > performance gap. Following is my test setup. > > Patchsets > ======== > - For Hugh's: > I checked out 4.6-rc3, applied Hugh's preliminary patches (01 to 10 > patches) from here: https://lkml.org/lkml/2016/4/5/792 and then applied the > THP patches posted on April 16 (01 to 29 patches). > > - For Kirill's: > I am using his branch "git:// > git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugetmpfs/v8", which > is based off of 4.6-rc3, posted on May 12. > > > Khugepaged settings > ================ > cd /sys/kernel/mm/transparent_hugepage > echo 10 >khugepaged/alloc_sleep_millisecs > echo 10 >khugepaged/scan_sleep_millisecs > echo 511 >khugepaged/max_ptes_none > > > Mount options > =========== > - For Hugh's: > sudo sysctl -w vm/shmem_huge=2 > sudo mount -o remount,huge=1 /hugetmpfs > > - For Kirill's: > sudo mount -o remount,huge=always /hugetmpfs > echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled > echo 511 >khugepaged/max_ptes_swap > > > Workload Setting > ============= > Please look at the attached setup document for Cassandra (NoSQL database): > cassandra-setup.txt > > > Machine setup > =========== > 36-core (72 hardware thread) dual-socket x86 server with 512 GB RAM running > Ubuntu. I use control groups for resource isolation. Server and client > threads run on different sockets. Frequency governor set to "performance" > to remove any performance fluctuations due to frequency variation. > > > Throughput numbers > ================ > Hugh's implementation: 74522.08 ops/sec > Kirill's implementation: 54919.10 ops/sec In my setup I don't see the difference: v4.7-rc1 + my implementation: [OVERALL], RunTime(ms), 822862.0 [OVERALL], Throughput(ops/sec), 60763.53021527304 ShmemPmdMapped: 4999168 kB v4.6-rc2 + Hugh's implementation: [OVERALL], RunTime(ms), 833157.0 [OVERALL], Throughput(ops/sec), 60012.698687042175 ShmemPmdMapped: 5021696 kB It's basically within measuarment error. 'ShmemPmdMapped' indicate how much memory is mapped with huge pages by the end of test. It's on dual-socket 24-core machine with 64G of RAM. I guess we have some configuration difference or something, but so far I don't see the drastic performance difference you've pointed to. May be my implementation behaves slower on bigger machines, I don't know.. There's no architectural reason for this. I'll post my updated patchset today. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html