On Sun, Jan 05, 2025 at 10:32:56AM -0700, Orion Poplawski wrote: > On 1/4/25 11:33, Orion Poplawski wrote: > >Since the latest update to OpenBLAS 0.3.28 in rawhide, FlexiBLAS > >fails to build in aarch64 because OpenBLAS crashes in the LAPACK- > >xeigtstc_cec_in test. Note that OpenBLAS itself does not fail only > >because they don't include LAPACK test suite. > > > >See: > >- The first failure in Koschei after the 0.3.28 update: https:// > >koschei.fedoraproject.org/package/flexiblas > >- The build log: https://koji.fedoraproject.org/koji/taskinfo? > >taskID=125998498 > > > >FTBFS report here: https://bugzilla.redhat.com/show_bug.cgi?id=2329491 > > > >I have attempted to collect some more debug info via the following > >- > >https://src.fedoraproject.org/fork/orion/rpms/flexiblas/tree/debug > > > >But the valgrind run just seems to hang with no output from > >valgrind - https://kojipkgs.fedoraproject.org//work/tasks/3875/127513875/build.log > > > > Tests of the Nonsymmetric eigenproblem condition estimation routines > > CTRSYL, CTREXC, CTRSNA, CTRSEN > > Relative machine precision (EPS) = 0.119209E-06 > > Safe minimum (SFMIN) = 0.117549E-37 > > Routines pass computational tests if test ratio is less than 20.00 > > CEC routines passed the tests of the error exits ( 41 tests done) > > > >And the crash seems to occur after memory corruption has already > >occurred so seems to be of limited utility. So I'm at a loss > >myself. > > So, looks like I'm just not waiting long enough for the test to > progress - valgrind must be adding a huge overhead. valgrind works by interpreting each instruction, so yes the overhead is rather large. On the other hand I really appreciate its ability to find bugs. >From the errors below it seems to be a problem with Neoverse code (https://en.wikipedia.org/wiki/ARM_Neoverse) which from my understanding is not related to the low end hardware I have at home (Raspberry Pis and similar). I think you'd need access to an Ampere or AWS Graviton system :-( qemu is able to emulate Neoverse N1, N2 & V1, so running a software emulated qemu-system-aarch64 virtual machine may be the way to go. (https://fedoraproject.org/wiki/Architectures/AArch64/Install_with_QEMU) > I'm now seeing: > > ==44481== Thread 10: > ==44481== Invalid read of size 4 > ==44481== at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e0 is 0 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > ==44481== Invalid read of size 4 > ==44481== at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e8 is 8 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > ==44481== Invalid write of size 4 > ==44481== at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e0 is 0 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > ==44481== Invalid write of size 4 > ==44481== at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e8 is 8 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > ==44481== Invalid read of size 4 > ==44481== at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e0 is 0 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > ==44481== Invalid read of size 4 > ==44481== at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e8 is 8 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > ==44481== Invalid write of size 4 > ==44481== at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e0 is 0 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > ==44481== Invalid write of size 4 > ==44481== at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in > /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) > ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) > ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) > ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) > ==44481== Address 0x53cd9e8 is 8 bytes after a block of size > 111,504 alloc'd > ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) > ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) > ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) > ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) > ==44481== by 0x10C327: main (cchkee.F:2553) > ==44481== > > But I'm back to not having access to openblas debuginfo in koji. > > Maybe I can reproduce the test failure somehow as part of the > openblas build. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue