[Bug 1933988] Review Request: nativejit - Library for high-performance just-in-time compilation of expressions involving C data structures

bugzilla@xxxxxxxxxx · Wed, 03 Mar 2021 17:21:50 +0000

https://bugzilla.redhat.com/show_bug.cgi?id=1933988

code@xxxxxxxxxxxxxxxxxx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |needinfo?(trpost@rocketmail
                   |                            |.com)

--- Comment #7 from code@xxxxxxxxxxxxxxxxxx ---
The library compiles with -mSSE4.2 so that the implementations of the two
overloads of NativeJIT::BitOp::GetNonZeroBitCount can use POPCNT/LZCNT. You are
already ExclusiveArch: x86_64, but the base x86_64 architecture does not
include these instructions. I recently raised this issue (admittedly, based on
a misunderstanding of the library in question at the time) and got FESCo to
record a policy: https://pagure.io/packaging-committee/issue/1044.

So you seem to have a few options:

  1. Proceed with packaging this library as-is, with SSE4.2 instructions
possibly
     sprinkled throughout due to the -mSSE4.2 option. This is allowable since
it
     is a library. In the COPASI application, you will have to provide runtime
     feature detection and a fallback that completely avoids calling any
functions
     from this library, since any of them could include SSE4 instructions.

  2. Patch this library to add runtime CPU detection and fallback
implementations,
     compiling the optimized versions of these two routines with
     __attribute__((__target__("sse4.2"))) or in a separate translation unit
with
    -mSSE4.2. Remove -mSSE4.2 from the overall compiler flags. It may be
difficult
    to avoid paying a possibly-significant performance penalty for added
indirection
    at the granularity of a tiny inlinable function.

  3. Give up the speed advantage of POPCNT/LZCNT: remove -mSSE4.2 from the
overall
     compiler flags, and replace the two overloads of
     NativeJIT::BitOp::GetNonZeroBitCount unconditionally with a generic
algorithm
     like
https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel.
     This has the advantage of simplicity, and of not adding indirection. The
     performance penalty will probably vary from trivial to substantial
depending on
     how heavily the JIT’ed routines use these operations.

What do you think?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are always notified about changes to this product and component
_______________________________________________
package-review mailing list -- package-review@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to package-review-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/package-review@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure