On Tue, Aug 01, 2017 at 09:07:35PM +0900, leesioh wrote: > In ksm, the checksum values are used to check changes in page content and keep the unstable tree more stable. > KSM implements checksum calculation with jhash2 hash function. > However, because jhash2 is implemented in software, > it consumes high CPU cycles (about 26%, according to KSM thread profiling results) > > To reduce CPU consumption, this commit applies the crc32 hash function > which is included in the SSE4.2 CPU instruction set. > This can significantly reduce the page checksum overhead as follows. > > I measured checksum computation 300 times to see how fast crc32 is compared to jhash2. > With jhash2, the average checksum calculation time is about 3460ns, > and with crc32, the average checksum calculation time is 888ns. This is about 74% less than jhash2. crc32 may create more false positives than jhash2. crc32 only guarantees a different value in return if fewer than N bit changes. False positives in crc32 comparison, would result in more unstable pages being added to the unstable tree, and if they're changing as result of false positives it may make the unstable tree more unstable leading to missed merges (in addition to the overhead of adding those to the unstable tree in the first place and in addition of risking an immediate cow post merge which would slowdown apps even more). I think if somebody wants a crc instead of a more proper hash (that is less likely to generate false positives if a couple of bits changes) it should be an option in sysfs not enabled by default, but overall I think it's not worth this change for a downgrade to crc. There's the risk an admin thinks it's going to make things runs faster because KSM CPU utilization decreases, but missing the risk of increased CoWs in app context or missed merges because of higher instability in the unstable tree. Still deploying hardware accelleration in the KSM hash is a interesting idea that I don't recall has been tried. Could you try to benchmark in userland (or kernel if you wish) software jhash2 vs CONFIG_CRYPTO_SHA1_SSSE3 or CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL instead of the accellerated crc? (I don't know if GHASH API can fit our use case though, but accellerated SHA1 sure would fit). I suppose they'll be slower than crc32, and probably slower than jhash2 too, however I can't be sure by just thinking about it. We've to also keep the floating point save and restore into account in the real world, where ksm schedules often and may run interleaved in the same CPU where an app uses the fpu a lot in userland (if the interleaved app doesn't use the fpu in userland it won't create overhead). Thanks! Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>