Re: Is it possible atlas is linked wrongly by new binutils?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dave Love wrote:

> Kevin Kofler writes:
>> If you are talking about the missing RPM AutoProvides:
>> Provides: libblas.so.3()(64bit)
>> does wonders.
> 
> I mean you need to get the soname right and ensure that you have
> everything implemented in the replacement library.

Only the soname of the Provides matters. The actual library file can be a 
symlink to the monolithic libopenblas.so.0, the dynamic linker (ld.so) will 
load it just fine. The soname is only read at link time, and there, it is 
fine (and in fact desired) that newly linked applications get 
libopenblas.so.0 recorded as the soname, not libblas.so.3.

>>> Various things have been changed to use openblas on x86 after some of us
>>> agitated.
>>
>> The problem is, "various things" is not enough, we need a plan to ensure
>> ALL things use it.
> 
> It's not available for them all as far as I know -- there's an rpm macro
> which says which ones.  I'm happy if that's wrong now.

"things" = "packages" here. Surely OpenBLAS should work for all the BLAS-
using packages on x86, especially if we symlink libblas.so to it. If not, it 
is a bug either in OpenBLAS or in the package.

OpenBLAS is not available for some exotic architectures, but the solution 
there is to build ATLAS (or some other implementation) for those 
architectures (and those architectures only) and set up the symlinks there 
too.

>> But the new approach I am proposing installs only one version of both
>> BLAS and LAPACK (the OpenBLAS one), so there cannot possibly be
>> mismatched versions (except if you have third-party binaries bundling
>> BLAS and linking to the system LAPACK or the other way round, but those
>> are then very broken and will also fail on other distributions for the
>> same reason).
> 
> "Third party" (user and system) binaries linking non-OB linear algebra
> is normal on the sort of systems I work on, though I wish I was allowed
> to package system installations.  (Red Hat would have caused chaos on
> our systems by introducing backwards-incompatible openmpi if it hadn't
> been caught by the local package dependencies.)

It is fine if third-party binaries either:
* link to the system version of both BLAS and LAPACK, or
* bundle their own version of both BLAS and LAPACK.

The only case where you can end up with an incompatible mix is if they link 
to the system version of one and bundle the other, which is very broken and 
hopefully not too common. (It will also break on Debian.)

> Also, OB still has at least some correctness problems as far as I know,
> e.g. <https://github.com/xianyi/OpenBLAS/issues/458>.

According to the comments, the latest version has precision issues only in 
the single-precision version. If you are using single precision where 
precision matters, you have a problem already.

Also, having an error of 17.something times a small error instead of 16.00 
times is hardly "incorrect". I would be more worried if it were returning 
completely wrong results (e.g., 1 instead of 0 or something like that).

And in the end, all software has bugs. glibc also has bugs, so should we 
attempt to make all of Fedora switchable to musl at runtime (or worse, 
arbitrarily link some of it to glibc, some to musl, some to ucLibc, and some 
to dietlibc, just because we can – this is the situation with BLAS/LAPACK 
right now)?

And keep in mind that floating-point computation ALWAYS returns 
approximations. If you need rigorous error bounds, you probably need 
something completely different entirely (e.g., interval arithmetic), which 
will of course have its own limitations.

> Is there actually anything wrong with Debian's tried and tested approach
> as long as openblas is preferred where appropriate?

Sorry, I really don't like the alternatives system. It requires you to make 
a global systemwide switch to change your implementation. If you want to 
make the implementation switchable by the user, it needs to be a runtime 
choice (using, e.g., environment modules). But I think we do not have to 
leave this decision to the user to begin with.

>> ld.so.conf.d is the only way to build those [versions optimized for
>> subarchitectures where no runtime detection is available], if you want to
>> support them. I wonder whether non-x86 architectures are even worth
>> investing the effort.
> 
> How would relevant hwcaps not help, if they were available, as they
> seemed to be on some architectures when I looked some time ago?  (That
> used to be important on SPARC for efficiency in crypto libraries, at
> least.)

My point is, those architectures are so rarely used, with Fedora at least, 
that I wonder whether it is worth Fedora maintainers' time to optimize for 
their subarchitectures. Of course it will give a performance benefit, that 
goes without doubt. But is it worth our time considering the actual usage?

> Yes it is, but if you can have atlas-sse3, I don't see why you can't
> have blis-avx512.  (Dynamism was on the radar for BLIS when I last
> looked, and might be worth contributing, but I haven't evaluated it
> against OB on anything other than KNL.)

It could be done, but the idea behind my proposal is that atlas-sse3 would 
go away. :-) Compile-time switching done right would mean to ship at least 6 
or 7 atlas-* packages on x86_64 (even more on i686 because you have SSE 1, 
MMX, and x87-only to support there too), probably over a dozen (because 
different CPUs supporting the same level of SSE/AVX don't necessarily have 
the same optimal settings, see the different kernels the OpenBLAS runtime 
switching supports). But of course, if AVX-512 is the only special case, it 
is probably doable (using the same ld.so.conf.d approach that atlas-sse3 
uses), though the BLIS build would need to be a drop-in for the default 
implementation, which would be OpenBLAS.

> and the KNL qua Haswell is contributed is worse than you might expect
> <https://github.com/xianyi/OpenBLAS/issues/991>.

To be honest, I did not expect it to be great. Not using AVX-512 is of 
course a bummer. Now, I would naïvely have guessed a factor of 2 rather than 
3 (because AVX2 is 256-bit, AVX-512 is 512-bit), but I guess AVX2 is just 
not as optimized as AVX-512 in the KNL hardware's circuitry.

Realistically, I think native AVX-512 support will come in OpenBLAS when 
people start getting those Skylake-X CPUs that have been out for a few weeks 
now. The market for high-performance coprocessors is just too small (which 
is kinda sad because the whole point of those coprocessors was to give you 
performance not available in any CPUs at the time).

        Kevin Kofler
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]
  Powered by Linux