Thank you very much for reading and responding to my commit. I understand the problem with crc32 you describe. I will investigate – as the first step, I will try to compare the number of CoWs with jhash2 and crc32. And I will send you the experiment results. Thanks again! -leesioh- 2017-08-02 오전 5:05에 Andrea Arcangeli 이(가) 쓴 글: > On Tue, Aug 01, 2017 at 09:07:35PM +0900, leesioh wrote: >> In ksm, the checksum values are used to check changes in page content and keep the unstable tree more stable. >> KSM implements checksum calculation with jhash2 hash function. >> However, because jhash2 is implemented in software, >> it consumes high CPU cycles (about 26%, according to KSM thread profiling results) >> >> To reduce CPU consumption, this commit applies the crc32 hash function >> which is included in the SSE4.2 CPU instruction set. >> This can significantly reduce the page checksum overhead as follows. >> >> I measured checksum computation 300 times to see how fast crc32 is compared to jhash2. >> With jhash2, the average checksum calculation time is about 3460ns, >> and with crc32, the average checksum calculation time is 888ns. This is about 74% less than jhash2. > crc32 may create more false positives than jhash2. crc32 only > guarantees a different value in return if fewer than N bit > changes. False positives in crc32 comparison, would result in more > unstable pages being added to the unstable tree, and if they're > changing as result of false positives it may make the unstable tree > more unstable leading to missed merges (in addition to the overhead of > adding those to the unstable tree in the first place and in addition > of risking an immediate cow post merge which would slowdown apps even > more). > > I think if somebody wants a crc instead of a more proper hash (that is > less likely to generate false positives if a couple of bits changes) > it should be an option in sysfs not enabled by default, but overall I > think it's not worth this change for a downgrade to crc. There's the > risk an admin thinks it's going to make things runs faster because KSM > CPU utilization decreases, but missing the risk of increased CoWs in > app context or missed merges because of higher instability in the > unstable tree. > > Still deploying hardware accelleration in the KSM hash is a > interesting idea that I don't recall has been tried. Could you try to > benchmark in userland (or kernel if you wish) software jhash2 vs > CONFIG_CRYPTO_SHA1_SSSE3 or CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL instead > of the accellerated crc? (I don't know if GHASH API can fit our use > case though, but accellerated SHA1 sure would fit). I suppose they'll > be slower than crc32, and probably slower than jhash2 too, however I > can't be sure by just thinking about it. > > We've to also keep the floating point save and restore into account in > the real world, where ksm schedules often and may run interleaved in > the same CPU where an app uses the fpu a lot in userland (if the > interleaved app doesn't use the fpu in userland it won't create > overhead). > > Thanks! > Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>