Re: [PATCH] mm/ksm : Checksum calculation function change (jhash2 -> crc32)

sioh Lee <solee@xxxxxxxxxxxxxx> · Tue, 29 Aug 2017 15:35:34 +0900

Hello,
Thank you for the reply and for being supportive.
First of all, I made a mistake in that I typed crc32 incorrectly. All the experiments were done using crc32c-intel, not crc32 (PCLMULQDQ).

Second, the reason for (priority < 200) is because the priority of crc32c-intel is 200 so that if the priority is less than 200, jhash2 is used.
Also, I have a question about implementation. Do you want to exclude jhash2 from ksm and go only with crc32 ? Could you please give me guidance about it?

Then, I will implement it and send you a new patch.
Once again, thank you so much for your reply.
Best regards,

-Sioh Lee-

2017-08-25 오전 4:14에 Andrea Arcangeli 이(가) 쓴 글:
> On Wed, Aug 09, 2017 at 10:17:31PM +0900, sioh Lee wrote:
>> Hello.
>> I am sending you the results of the experiments.
>> The experiment was done for two workloads.
>> The first is Kernel build (CPU Intensive) and the second is the iozone benchmark (I/O Intensive).
>> In the experiment, four VMs compile kernel at the same time.
>> I also experimented with iozone in the same way.
>>
>>
>> The values measured in the experiment are:
>> 1. CoW count, 2. Checksum computation time, 3. pages_unshared, 4. pages_sharing, 5. (pages_unshared / pages_sharing).
>> The experiment was conducted twice for each workload and the average value was calculated.
>> Checksum computation time, pages_unshared, and pages_sharing are recorded every 1 second,
>> and the average of the recorded values is obtained after the end of the experiment.
>> The CoW was also recorded whenever CoW occurs on a shared page.
>>
>> Experiment environment
>>
>> test platform : openstack cloud platform (NEWTON version)
>> Experiment node : openstack based cloud compute node (CPU: Xeon E5-2650 v3 2.3Ghz 10core, memory : 64Gb)
>> VM : (2 VCPU, RAM 4GB, DISK 20GB) * 4
>> workload : Kernel Compile (kernel 4.47), iozone (read, write, random read and write for 2GB)
>> KSM setup - sleep_millisecs : 200ms, pages_to_scan : 1600
>>
>> The experimental results are as follows. (All values are truncated to the second decimal place)
> Not sure if kernel build but especially iozone are the best tests for
> this, a number crunching may have generate a larger set of pages,
> these issues would be noticeable only with a very large unstable tree
> (i.e. very large pages_unshared). You've got a very tiny
> pages_unshared of only <=1GB in your tests. That should grow much
> larger to make the measurement meaningful for this hash collision
> concern.
>
> However so far I checked some more information on the collisions of
> crc32. With equal sized random data (i.e. our case, PAGE_SIZE in
> length) it takes about ~78000 tries of new random data for crc32 to
> trigger a collision with 51% probability, or ~200000 tries to get a
> collision with 99% probability.
>
> jhash2 instead starts seeing collisions with 2**52 random data tries?
>
> Our objective here is to simulate a dirty bit, the problem with the
> real pagetable dirty bit is it potentially requires an IPI to threads
> using the "mm" to clear it reliably (i.e. to be sure it gets set on
> next write), and not all archs have the dirty bit set in
> hardware. Write protecting to simulate the dirty bit also wouldn't be
> a solution, because our whole objective is to avoid wrprotecting the
> first place while at the same time we try not to make the unstable
> tree too unstable by adding pages that are statistically changing
> frequently to it.
>
> There were attempts in the past to use the dirty bit on x86 (plus
> skipping the IPI probably wouldn't be a big issue for KSM because the
> scan is linear and takes quite some time to complete if there's lots
> of virtual memory to merge).
>
> So after evaluating 200000 different candidate pages that are changing
> frequently, to see if they're worth adding them to the unstable tree,
> we'll sure get a false positive if compared to jhash2 (we'll add one
> more page than we should have to). Considering the pages can start
> changing at any time regardless after we add them to the unstable tree
> and this check is only statistically significant, the difference with
> jhash2 is probably not going to be noticeable here. We don't use the
> hash to find equal pages of course, the unstable tree does that. I
> earlier thought crc32 would require fewer tries before creating
> collisions, much closer to a checksum. So now I'm optimistic this
> change will be a good tradeoff.
>
> If we switch to crc32 however I don't see why not to use it also if
> the intel insn isn't present (why the < 200?). It's surely running
> much faster also in the generic implementation (jhash2 isn't
> accelerated either).
>
> If we force CRC32C to be enabled if KSM is selected we could drop
> jhash2 entirely. Then we'd need a way to ensure the accelerated
> version gets loaded.
>
> Now about implementation issues, crc32 insn is actually used by crc32c
> not by crc32 (crc32c seems preferable polynomial anyway), the crc32
> that you selected only uses PCLMULQDQ, not the crc32 insn of sse4.2.
>
> I guess on Intel we could try to load crc32c first, if that showup at
> priority < 200 (sse4.2 crc32 that I think you intended to use but you
> got the PCLMULQDQ crc32 instead), then try again with crc32, if that
> also showup at priority < 200, then take crc32c generic (without
> fallback to jhash2 provided we've a build-time way to ensure the
> generic implementation is linked into the kernel as suggested above).
>
> Other archs using different algorithm sounds fine too, for example
> assuming crc32be would be the fastest (not sure, just a guess) it
> could be arch dependent the selection logic (the only non trivial
> issue would be to avoid #ifdefs to make it arch dependent). As long as
> the default fallbacks to generic that's fine.
>
> Thanks,
> Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>