On 26/03/2014 18:40, Milosz Tanski wrote: > Loic, > > I don't mean to be redundant since I posted this comment already in > the github on commit comments but I'm not sure if you saw this. Thanks for posting it : your comment got lost by a rebase ( github's not good at that ... ). > Instead of doing cpuid manually you can use builtins provided in gcc > (and in clang). There's a cpuid.h header you can include. This > stackoverflow answer has a good summary of it: > http://stackoverflow.com/questions/14266772/how-do-i-call-cpuid-in-linux?answertab=votes#tab-top It is a nice improvement to have indeed. Created http://tracker.ceph.com/issues/7869 Cheers > On Wed, Mar 26, 2014 at 3:14 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >> Hi Kevin & Milosz, >> >> So it would be >> >> if(sse4 & sse3) => use a plugin compiled with sse + sse3 + sse4 activated >> else if(sse3) => use a plugin with sse2 + sse3 activated but not sse4 >> else => fallback to not using sse at all >> >> like so: >> >> https://github.com/dachary/ceph/commit/b6e4307bd2ee1de6e8bbda0ced370d484d512114#diff-5249f49580782dfe95a1cbcc986ee5deR113 >> >> If I understand Laurent correctly, the right approach would be to semi-transparently generate and select the code path depending on the features at runtime. But that would require more work and I created a ticket to track this : http://tracker.ceph.com/issues/7865 >> >> Does that sound right ? >> >> On 25/03/2014 22:31, Kevin Greenan wrote: >>> Hey Loic, >>> >>> I think we want something closer to what Milosz is proposing (3 cut-offs instead of 2) . The shuffle instruction is part of SSSE3 and is the basis for the SSE split table techniques, which are super fast. By doing all-or-nothing, it is possible many users would not be able to take advantage of it when they are capable. >>> >>> Make sense? >>> >>> -kevin >>> >>> >>> On Tue, Mar 25, 2014 at 12:46 PM, Milosz Tanski <milosz@xxxxxxxxx <mailto:milosz@xxxxxxxxx>> wrote: >>> >>> It gets a bit more tricky with x86_64 since the arch dictates that the >>> base line has SSE2 (but not necessarily later). >>> >>> I would do is both support SSE2 (maybe in core without dlopen) and >>> then support all the others in a SSE4 version (including SSE4_PCMUL). >>> I'm glossing over x86-32 here, but you could something similar. >>> >>> Best >>> - Milosz >>> >>> On Tue, Mar 25, 2014 at 3:21 PM, Loic Dachary <loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx>> wrote: >>> > >>> > >>> > On 25/03/2014 20:13, Kevin Greenan wrote: >>> >> +1 >>> >> >>> >> Yeah, that sounds better... Let's keep this as simple as possible. >>> > >>> > I'll rework the https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse accordingly. >>> > >>> > Would it be sensible to compile with SSE optimizations only if all are available ( SSE2, SSSE3, SSE4, SSE4_PCMUL ) and not attempt to distinguish betweel SSSE3 being available but not SSE4_PCMUL etc. From what I understand at this point that kind of distinction is going to be difficult to manage anyway. >>> > >>> > Is it too simplistic ? >>> > >>> >> >>> >> -kevin >>> >> >>> >> >>> >> On Tue, Mar 25, 2014 at 12:08 PM, Loic Dachary <loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx> <mailto:loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx>>> wrote: >>> >> >>> >> Andreas Peters suggested another approach, which makes sense to me : have one plugin with SSE optimizations enabled, another without them and chose at runtime between the two. >>> >> >>> >> What do you think ? >>> >> >>> >> On 23/03/2014 20:50, Loic Dachary wrote: >>> >> > Hi Laurent, >>> >> > >>> >> > In the context of optimizing erasure code functions implemented by Kevin Greenan (cc'ed) and James Plank at https://bitbucket.org/jimplank/gf-complete/ we ran accross a question you may have the answer to: can gcc -msse2 (or -msse* for that matter ) have a negative impact on the portability of the compiled binary code ? >>> >> > >>> >> > In other words, if a code is compiled without -msse* and runs fine on all intel processors it targets, could it be that adding -msse* to the compilation of the same source code generate a binary that would fail on some processors ? This is assuming no sse specific functions were used in the source code. >>> >> > >>> >> > In gf-complete, all sse specific instructions are carefully protected to not be run on a CPU that does not support them. The runtime detection is done by checking CPU id bits ( see https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 ) >>> >> > >>> >> > The corresponding thread is at: >>> >> > >>> >> > https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296 >>> >> > >>> >> > Cheers >>> >> > >>> >> >>> >> -- >>> >> Loïc Dachary, Artisan Logiciel Libre >>> >> >>> >> >>> > >>> > -- >>> > Loïc Dachary, Artisan Logiciel Libre >>> > >>> >>> >>> >>> -- >>> Milosz Tanski >>> CTO >>> 10 East 53rd Street, 37th floor >>> New York, NY 10022 >>> >>> p: 646-253-9055 <tel:646-253-9055> >>> e: milosz@xxxxxxxxx <mailto:milosz@xxxxxxxxx> >>> >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> > > > -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature