Re: GCC -msse2 portability question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 26/03/2014 18:40, Milosz Tanski wrote:
> Loic,
> 
> I don't mean to be redundant since I posted this comment already in
> the github on commit comments but I'm not sure if you saw this.

Thanks for posting it : your comment got lost by a rebase ( github's not good at that ... ).

> Instead of doing cpuid manually you can use builtins provided in gcc
> (and in clang). There's a cpuid.h header you can include. This
> stackoverflow answer has a good summary of it:
> http://stackoverflow.com/questions/14266772/how-do-i-call-cpuid-in-linux?answertab=votes#tab-top

It is a nice improvement to have indeed. Created http://tracker.ceph.com/issues/7869

Cheers

> On Wed, Mar 26, 2014 at 3:14 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>> Hi Kevin & Milosz,
>>
>> So it would be
>>
>> if(sse4 & sse3) => use a plugin compiled with sse + sse3 + sse4 activated
>> else if(sse3) => use a plugin with sse2 + sse3 activated but not sse4
>> else => fallback to not using sse at all
>>
>> like so:
>>
>> https://github.com/dachary/ceph/commit/b6e4307bd2ee1de6e8bbda0ced370d484d512114#diff-5249f49580782dfe95a1cbcc986ee5deR113
>>
>> If I understand Laurent correctly, the right approach would be to semi-transparently generate and select the code path depending on the features at runtime. But that would require more work and I created a ticket to track this : http://tracker.ceph.com/issues/7865
>>
>> Does that sound right ?
>>
>> On 25/03/2014 22:31, Kevin Greenan wrote:
>>> Hey Loic,
>>>
>>> I think we want something closer to what Milosz is proposing (3 cut-offs instead of 2) .  The shuffle instruction is part of SSSE3 and is the basis for the SSE split table techniques, which are super fast.  By doing all-or-nothing, it is possible many users would not be able to take advantage of it when they are capable.
>>>
>>> Make sense?
>>>
>>> -kevin
>>>
>>>
>>> On Tue, Mar 25, 2014 at 12:46 PM, Milosz Tanski <milosz@xxxxxxxxx <mailto:milosz@xxxxxxxxx>> wrote:
>>>
>>>     It gets a bit more tricky with x86_64 since the arch dictates that the
>>>     base line has SSE2 (but not necessarily later).
>>>
>>>     I would do is both support SSE2 (maybe in core without dlopen) and
>>>     then support all the others in a SSE4 version (including SSE4_PCMUL).
>>>     I'm glossing over x86-32 here, but you could something similar.
>>>
>>>     Best
>>>     - Milosz
>>>
>>>     On Tue, Mar 25, 2014 at 3:21 PM, Loic Dachary <loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx>> wrote:
>>>     >
>>>     >
>>>     > On 25/03/2014 20:13, Kevin Greenan wrote:
>>>     >> +1
>>>     >>
>>>     >> Yeah, that sounds better...  Let's keep this as simple as possible.
>>>     >
>>>     > I'll rework the https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse accordingly.
>>>     >
>>>     > Would it be sensible to compile with SSE optimizations only if all are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what I understand at this point that kind of distinction is going to be difficult to manage anyway.
>>>     >
>>>     > Is it too simplistic ?
>>>     >
>>>     >>
>>>     >> -kevin
>>>     >>
>>>     >>
>>>     >> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx> <mailto:loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx>>> wrote:
>>>     >>
>>>     >>     Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two.
>>>     >>
>>>     >>     What do you think ?
>>>     >>
>>>     >>     On 23/03/2014 20:50, Loic Dachary wrote:
>>>     >>     > Hi Laurent,
>>>     >>     >
>>>     >>     > In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ?
>>>     >>     >
>>>     >>     > In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code.
>>>     >>     >
>>>     >>     > In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 )
>>>     >>     >
>>>     >>     > The corresponding thread is at:
>>>     >>     >
>>>     >>     > https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>>     >>     >
>>>     >>     > Cheers
>>>     >>     >
>>>     >>
>>>     >>     --
>>>     >>     Loïc Dachary, Artisan Logiciel Libre
>>>     >>
>>>     >>
>>>     >
>>>     > --
>>>     > Loïc Dachary, Artisan Logiciel Libre
>>>     >
>>>
>>>
>>>
>>>     --
>>>     Milosz Tanski
>>>     CTO
>>>     10 East 53rd Street, 37th floor
>>>     New York, NY 10022
>>>
>>>     p: 646-253-9055 <tel:646-253-9055>
>>>     e: milosz@xxxxxxxxx <mailto:milosz@xxxxxxxxx>
>>>
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux