Re: [PATCH] mm/ksm : Checksum calculation function change (jhash2 -> crc32)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 01, 2017 at 09:07:35PM +0900, leesioh wrote:
> In ksm, the checksum values are used to check changes in page content and keep the unstable tree more stable.
> KSM implements checksum calculation with jhash2 hash function.
> However, because jhash2 is implemented in software,
> it consumes high CPU cycles (about 26%, according to KSM thread profiling results)
> 
> To reduce CPU consumption, this commit applies the crc32 hash function
> which is included in the SSE4.2 CPU instruction set.
> This can significantly reduce the page checksum overhead as follows.
> 
> I measured checksum computation 300 times to see how fast crc32 is compared to jhash2.
> With jhash2, the average checksum calculation time is about 3460ns,
> and with crc32, the average checksum calculation time is 888ns. This is about 74% less than jhash2.

crc32 may create more false positives than jhash2. crc32 only
guarantees a different value in return if fewer than N bit
changes. False positives in crc32 comparison, would result in more
unstable pages being added to the unstable tree, and if they're
changing as result of false positives it may make the unstable tree
more unstable leading to missed merges (in addition to the overhead of
adding those to the unstable tree in the first place and in addition
of risking an immediate cow post merge which would slowdown apps even
more).

I think if somebody wants a crc instead of a more proper hash (that is
less likely to generate false positives if a couple of bits changes)
it should be an option in sysfs not enabled by default, but overall I
think it's not worth this change for a downgrade to crc. There's the
risk an admin thinks it's going to make things runs faster because KSM
CPU utilization decreases, but missing the risk of increased CoWs in
app context or missed merges because of higher instability in the
unstable tree.

Still deploying hardware accelleration in the KSM hash is a
interesting idea that I don't recall has been tried. Could you try to
benchmark in userland (or kernel if you wish) software jhash2 vs
CONFIG_CRYPTO_SHA1_SSSE3 or CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL instead
of the accellerated crc?  (I don't know if GHASH API can fit our use
case though, but accellerated SHA1 sure would fit).  I suppose they'll
be slower than crc32, and probably slower than jhash2 too, however I
can't be sure by just thinking about it.

We've to also keep the floating point save and restore into account in
the real world, where ksm schedules often and may run interleaved in
the same CPU where an app uses the fpu a lot in userland (if the
interleaved app doesn't use the fpu in userland it won't create
overhead).

Thanks!
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux