Re: strange reproducibility problem with QImage

"Ben Beasley" <code@xxxxxxxxxxxxxxxxxx> · Sun, 03 Nov 2024 11:02:16 -0500

Kevin’s observation about floating-point rounding and runtime dispatch is an excellent one in general.

Those two CPU’s should, as far as I can tell, be dispatched to the same SIMD implementations in this case.

Skimming https://github.com/qt/qtbase/blob/v6.8.0/src/gui/painting/qimagescale_sse4.cpp, it looks like a fixed-point implementation that entirely avoids floating-poont operations. If there are no bugs, and if I’m not missing something, it should be possible to get identical results regardless of ISA extensions since no rounding is involved.

The fact that the scaling algorithm appears to be integer-based also makes the following sources of irreproducibility less likely, but maybe not impossible:

- Some algorithms compute “left-over” leading and/or trailing data with a scalar algorithm, and in some cases this could make the results depend on alignment of buffers in memory. Besides the fact that this is an integer implementation, at a glance, Qt doesn’t appear to be doing this. It looks like QImage must be aligned and (over-)allocated to allow everything to be done in SIMD, processing some extra pixels outside the image as necessary to make complete vectors.

- SIMD algorithms might operate on input values and combine pixels in a different order than scalar ones, which could result in different rounding for floating-point operations. That shouldn’t matter for an integer algorithm like this, except maybe in cases of wrapping/overflow – which might perhaps be in play here.

Another relevant fact is that the implementation is multi-threaded using a thread pool. If there is anything that depends on the order in which pixels/blocks are computed and combined, this could also result in different outputs, even in different runs on the same machine, and especially on machines with different numbers of cores.

All of this is written on a phone, without digging very deeply into the source or doing any practical experiments.

On Sun, Nov 3, 2024, at 7:38 AM, Zbigniew Jędrzejewski-Szmek wrote:
> On Sun, Nov 03, 2024 at 04:08:38AM +0100, Kevin Kofler via devel wrote:
>> Zbigniew Jędrzejewski-Szmek wrote:
>> > With python3-pyqt6-6.8.0-0.1.fc42.x86_64, we get a difference in how the
>> > icons are rendered:
>> > 
>> >     calibre-7.20.0-1.fc42.x86_64
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/16x16/apps/calibre-gui.png
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/32x32/apps/calibre-ebook-edit.png
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/32x32/apps/calibre-gui.png
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/32x32/apps/calibre-viewer.png ...
>> > 
>> > There are some tiny differences in shading of some pixels. The difference
>> > is not discernible visually for me. [1] has example icons attached.
>> > 
>> > Is this a bug in Qt and implementation of QImage.scaled [3] ?
>> 
>> As I understand the Qt source code, QImage.scaled with the 
>> Qt.TransformationMode.SmoothTransformation flag ends up calling 
>> QImage.smoothScaled (QImage.scaled calls the general QImage.transformed, 
>> which then detects the special case and calls QImage.smoothScaled), which in 
>> turn calls the private qSmoothScaleImage. And that one uses a different 
>> algorithm based on whether the CPU is runtime-detected to support SSE 4.1 or 
>> not. (For non-x86, there are also optimized implementations for ARM NEON and 
>> Longsoon LSX, also with runtime detection, otherwise the generic C 
>> implementation is used, as on pre-SSE-4.1 x86.) See 
>> https://code.qt.io/cgit/qt/qtbase.git/tree/src/gui/painting/qimagescale.cpp 
>> and 
>> https://code.qt.io/cgit/qt/qtbase.git/tree/src/gui/painting/qimagescale_sse4.cpp 
>> . It is likely that the vectorized implementation rounds slightly 
>> differently. So you then end up with different results when building on non-
>> identical builder hardware.
>
> Wow, thank you, that is a great find.
>
> The koji build used GenuineIntel Intel Xeon Processor (Cascadelake), while
> my rebuilder used AuthenticAMD AMD EPYC 9R14. They both have SSE 4.1 (1,2),
> so theoretically qt_qimageScaleAARGBA_down_x_up_y_sse4() would be used in
> both cases. But those are significantly different CPUs, so it's seems possible
> that the difference is caused by the optimized vector implementations.
> I'm not sure though: could the exact same code deliver non-bit-identical
> results on different CPUs when processing 128-bit ints?
>
> (1) fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
> pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm 
> constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq 
> vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt 
> tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 
> 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow 
> flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 
> erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd 
> avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat vnmi umip pku 
> ospke avx512_vnni md_clear flush_l1d arch_capabilities
>
> (2) fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
> pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid 
> aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid 
> sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor 
> lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch 
> topoext perfctr_core ssbd perfmon_v2 ibrs ibpb stibp ibrs_enhanced 
> vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed 
> adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl 
> xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr rdpru 
> wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq 
> avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid flush_l1d
>
> Zbyszek
> -- 
> _______________________________________________
> devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue