Re: Help with flexiblas crash on aarch64 in kojij only

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 05, 2025 at 10:32:56AM -0700, Orion Poplawski wrote:
> On 1/4/25 11:33, Orion Poplawski wrote:
> >Since the latest update to OpenBLAS 0.3.28 in rawhide, FlexiBLAS
> >fails to build in aarch64 because OpenBLAS crashes in the LAPACK-
> >xeigtstc_cec_in test. Note that OpenBLAS itself does not fail only
> >because they don't include LAPACK test suite.
> >
> >See:
> >- The first failure in Koschei after the 0.3.28 update: https://
> >koschei.fedoraproject.org/package/flexiblas
> >- The build log: https://koji.fedoraproject.org/koji/taskinfo?
> >taskID=125998498
> >
> >FTBFS report here: https://bugzilla.redhat.com/show_bug.cgi?id=2329491
> >
> >I have attempted to collect some more debug info via the following
> >-
> >https://src.fedoraproject.org/fork/orion/rpms/flexiblas/tree/debug
> >
> >But the valgrind run just seems to hang with no output from
> >valgrind - https://kojipkgs.fedoraproject.org//work/tasks/3875/127513875/build.log
> >
> >  Tests of the Nonsymmetric eigenproblem condition estimation routines
> >  CTRSYL, CTREXC, CTRSNA, CTRSEN
> >  Relative machine precision (EPS) =     0.119209E-06
> >  Safe minimum (SFMIN)             =     0.117549E-37
> >  Routines pass computational tests if test ratio is less than   20.00
> >  CEC routines passed the tests of the error exits ( 41 tests done)
> >
> >And the crash seems to occur after memory corruption has already
> >occurred so seems to be of limited utility.  So I'm at a loss
> >myself.
> 
> So, looks like I'm just not waiting long enough for the test to
> progress - valgrind must be adding a huge overhead.

valgrind works by interpreting each instruction, so yes the overhead
is rather large.  On the other hand I really appreciate its ability to
find bugs.

>From the errors below it seems to be a problem with Neoverse code
(https://en.wikipedia.org/wiki/ARM_Neoverse) which from my
understanding is not related to the low end hardware I have at home
(Raspberry Pis and similar).

I think you'd need access to an Ampere or AWS Graviton system :-(

qemu is able to emulate Neoverse N1, N2 & V1, so running a software
emulated qemu-system-aarch64 virtual machine may be the way to go.
(https://fedoraproject.org/wiki/Architectures/AArch64/Install_with_QEMU)

> I'm now seeing:
> 
> ==44481== Thread 10:
> ==44481== Invalid read of size 4
> ==44481==    at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e0 is 0 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> ==44481== Invalid read of size 4
> ==44481==    at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e8 is 8 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> ==44481== Invalid write of size 4
> ==44481==    at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e0 is 0 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> ==44481== Invalid write of size 4
> ==44481==    at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e8 is 8 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> ==44481== Invalid read of size 4
> ==44481==    at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e0 is 0 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> ==44481== Invalid read of size 4
> ==44481==    at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e8 is 8 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> ==44481== Invalid write of size 4
> ==44481==    at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e0 is 0 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> ==44481== Invalid write of size 4
> ==44481==    at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in
> /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
> ==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
> ==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
> ==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
> ==44481==  Address 0x53cd9e8 is 8 bytes after a block of size
> 111,504 alloc'd
> ==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
> ==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
> ==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
> ==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
> ==44481==    by 0x10C327: main (cchkee.F:2553)
> ==44481==
> 
> But I'm back to not having access to openblas debuginfo in koji.
> 
> Maybe I can reproduce the test failure somehow as part of the
> openblas build.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux