On 2021-06-29 18:03, Veronika Kabatova wrote:
On Tue, Jun 29, 2021 at 5:15 PM Robin Murphy <robin.murphy@xxxxxxx> wrote:
On 2021-06-29 15:44, Lorenzo Pieralisi wrote:
On Tue, Jun 29, 2021 at 12:48:14PM +0100, Robin Murphy wrote:
[ +ACPI audience ]
On 2021-06-25 12:15, Robin Murphy wrote:
On 2021-06-25 12:09, Catalin Marinas wrote:
On Fri, Jun 25, 2021 at 12:02:52PM +0100, Robin Murphy wrote:
On 2021-06-25 10:52, Veronika Kabatova wrote:
[...]
❌ stress: stress-ng
Oh no, this looks like another alignment fault in memcpy:
[13330.651903] Unable to handle kernel paging request at
virtual address ffff8000534705ff
[13330.651914] Mem abort info:
[13330.651918] ESR = 0x96000021
[13330.651922] EC = 0x25: DABT (current EL), IL = 32 bits
[13330.651928] SET = 0, FnV = 0
[13330.651931] EA = 0, S1PTW = 0
[13330.651933] FSC = 0x21: alignment fault
[13330.651938] Data abort info:
[13330.651940] ISV = 0, ISS = 0x00000021
[13330.651941] CM = 0, WnR = 0
[13330.651943] swapper pgtable: 4k pages, 48-bit VAs,
pgdp=00000000f3e6b000
[13330.651945] [ffff8000534705ff] pgd=1000008ffcfff003,
p4d=1000008ffcfff003, pud=100000088e57d003,
pmd=10000008d0aeb003, pte=006800008021370f
[13330.651956] Internal error: Oops: 96000021 [#1] SMP
[13330.651961] Modules linked in: unix_diag binfmt_misc
fcrypt sm4_generic crc32_generic md4 michael_mic
nhpoly1305_neon nhpoly1305 poly1305_generic libpoly1305
poly1305_neon rmd160 sha3_generic sm3_generic
streebog_generic wp512 blowfish_generic blowfish_common
cast5_generic des_generic libdes chacha_generic
chacha_neon libchacha camellia_generic cast6_generic
cast_common serpent_generic twofish_generic
twofish_common dm_thin_pool dm_persistent_data
dm_bio_prison nvme nvme_core ipmi_watchdog ipmi_poweroff
loop tun af_key crypto_user scsi_transport_iscsi
xt_multiport ip_gre ip_tunnel gre overlay xt_CONNSECMARK
xt_SECMARK nft_counter xt_state xt_conntrack nft_compat
ah6 ah4 nft_objref nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 nf_tables nfnetlink jfs sctp
ip6_udp_tunnel udp_tunnel dm_log_writes dm_flakey rfkill
mlx5_ib ib_uverbs ib_core sunrpc coresight_etm4x
i2c_smbus coresight_replicator coresight_tpiu
coresight_tmc joydev mlx5_core acpi_ipmi psample
ipmi_ssif mlxfw !
ipmi_devintf
[13330.652076] ipmi_msghandler coresight_funnel
thunderx2_pmu coresight vfat fat fuse zram ip_tables xfs
ast crct10dif_ce i2c_algo_bit ghash_ce drm_vram_helper
drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops cec drm_ttm_helper ttm drm gpio_xlp
i2c_xlp9xx uas usb_storage aes_neon_bs [last unloaded:
nvmet]
[13330.652123] CPU: 115 PID: 188446 Comm: stress-ng
Tainted: G OEL 5.13.0-rc7 #1
[13330.652129] Hardware name: HPE Apollo 70
/C01_APACHE_MB , BIOS L50_5.13_1.15 05/08/2020
[13330.652133] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[13330.652139] pc : __memcpy+0x168/0x250
[13330.652150] lr : memory_read_from_buffer+0x58/0x80
[13330.652161] sp : ffff800063ef3c20
[13330.652163] x29: ffff800063ef3c20 x28:
ffff0008b1380000 x27: 0000000000000000
[13330.652170] x26: 0000000000000000 x25:
0000000000000000 x24: ffff00080a960fe0
[13330.652176] x23: ffff800063ef3d28 x22:
000000000000063f x21: ffff800063ef3c88
[13330.652181] x20: 000000000000063f x19:
000000000000063f x18: 0000000000000000
[13330.652186] x17: 0000000000000000 x16:
0000000000000000 x15: 0000000000000000
[13330.652191] x14: 0000000000000000 x13:
0000000000000000 x12: 0000000000000000
[13330.652196] x11: 0000000000000000 x10:
0000000000000000 x9 : 0000000000000000
[13330.652200] x8 : 0000000000000000 x7 :
0000000000000000 x6 : 0000000000000000
[13330.652206] x5 : ffff000d0fb0063f x4 :
ffff80005347063f x3 : ffff000d0fb005c0
[13330.652212] x2 : ffffffffffffffef x1 :
ffff800053470600 x0 : ffff000d0fb00000
[13330.652218] Call trace:
[13330.652221] __memcpy+0x168/0x250
[13330.652225] acpi_data_show+0x5c/0x8c
[13330.652232] sysfs_kf_bin_read+0x78/0xa0
[13330.652238] kernfs_file_read_iter+0x9c/0x1a4
[13330.652241] kernfs_fop_read_iter+0x34/0x50
[13330.652244] new_sync_read+0xdc/0x154
[13330.652253] vfs_read+0x158/0x1e4
[13330.652260] ksys_read+0x64/0xec
[13330.652266] __arm64_sys_read+0x28/0x34
[13330.652273] invoke_syscall+0x50/0x120
[13330.652280] el0_svc_common.constprop.0+0x4c/0xd4
[13330.652284] do_el0_svc+0x30/0x9c
[13330.652286] el0_svc+0x2c/0x54
[13330.652294] el0t_64_sync_handler+0x1a4/0x1b0
[13330.652296] el0t_64_sync+0x19c/0x1a0
[13330.652303] Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e)
[13330.652307] ---[ end trace 227d4380f57145d4 ]---
So maybe this issue isn't limited to weird modules, after all...
It ran on the machine from the same set that we were able to reproduce
it on previously. If you or anyone else have an idea on how
to stabilize
the reproducibility or have a debug patch we'll be happy to try it.
Possibly it depends on the individual machines' firmware exactly how the
relevant bits of their ACPI tables are aligned in memory?
I've started digging into that callstack - it may not be a
"weird module"
but it's definitely crusty ACPI code... a238317ce818 ("ACPI: Clean up
acpi_os_map/unmap_memory() to eliminate __iomem.") looks frankly a bit
questionable in its decision to blithely cast away __iomem, but then the
rationale in aafc65c731fe ("ACPI: add arm64 to the platforms that use
ioremap") seems particularly dubious on top of that (especially
given this
end result).
At a wild guess, I'm wondering if this may be sufficient:
----->8-----
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 327e1b4eb6b0..f5d26b102fbe 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -277,7 +277,7 @@ acpi_map_lookup_virt(void __iomem *virt,
acpi_size size)
return NULL;
}
-#if defined(CONFIG_IA64) || defined(CONFIG_ARM64)
+#if defined(CONFIG_IA64)
/* ioremap will take care of cache attributes */
#define should_use_kmap(pfn) 0
#else
-----8<-----
I thought the same but shouldn't acpi_os_ioremap() map it with the right
attributes? It uses the EFI maps to check what kind of memory this is.
Oh crikey, I missed that branch of the rabbit hole... I guess that must
mean that the tables being poked here are *not* covered by the EFI
memory map, so page_is_ram() is unlikely to help either :(
After picking through the UEFI spec I think I've now got a clearer picture
of what's happening, but I'm not sure where it goes from here...
The spec implies that it *is* legitimate for runtime-loaded ACPI tables to
lie outside the EFI memory map, and that case they must be assumed to be
uncached, so the behaviour of acpi_os_ioremap() is correct.
I'd agree with the reasoning, it would be good to pinpoint whether
that's what actually triggers the issue.
I'd like to replicate it if possible (it is TX2 HW but firmware
config is likely to differ from the HW I have at hand), the
test command line that triggers the fault would be useful as
a starting point.
The failure is always triggered during stress testing, see source at
https://gitlab.com/cki-project/kernel-tests/-/tree/main/stress/stress-ng
The runtest file is specific to run in our lab, but all it does is running
subsets of the upstream test (see the "*.stressors" files). Upstream
test version is V0.12.05 and the version wasn't changed since long
before we started hitting the problem. The failures were observed on
both Fedora 33 and 34 releases, I don't think the distro choice matters
but mentioning it just in case.
It doesn't reproduce 100%, anytime we tried to reproduce it on purpose
we couldn't trigger the issue. As if it knew we're watching...
Ah, from that I can only assume that this must be stress-ng's --sysfs
test reading things at random, so not only would it have to be on a
machine whose firmware presents the right thing in the right way but the
random test conditions would also have to line up to poke it the "right"
(wrong) way too.
As a temporary workaround for the CI flakes, might it be possible to
configure stress-ng to stay away from just these ACPI "data" files?
Robin.
Furthermore, is this a v5.13-rc* regression ? If so it would be
good to bisect it - I can't recollect arm64 changes that could
have introduced this regression in the last cycle but I may have
missed something.
The actual change which has brought this to light is the update to
arm64's memcpy() routine for 5.13 - the new version is more aggressive
at making unaligned loads from the source buffer, so now triggers
alignment faults more readily when (wrongly) used on iomem mappings in
places that were getting away with it by chance under the previous
implementation (see also [1], for example).
We dug into the history of runs, the first record we have is from the
mainline commit eb6bbacc46720b8b from April 28. The previously tested
commit fafe1e39ed213221c0bce doesn't hit any problems when running
stress-ng. Unfortunately we don't have logs from that first failed run
anymore as they are only saved for 6 weeks, the first logs we still have
are from May 11 from mainline commit c90b1834703f13b3a:
https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2021/05/11/301024644/build_aarch64_redhat:1253670447/tests/9969720_aarch64_2_console.log
Veronika
Thanks,
Robin.
[1]
https://lore.kernel.org/linux-arm-kernel/20210608153344.3813661-1-narmstrong@xxxxxxxxxxxx/
Given the definition of uncached for arm64 memory types though, that
means that callers of acpi_os_map_memory() still have to be prepared
to get an __iomem pointer back even if they know they're mapping a
table rather than some random bit of MMIO for an AML method.
Therefore in this case it seems the blame lies partway between
acpi_os_map_memory() for casting away __iomem and acpi_data_show() for
letting an arbitrary offset lead to an arbitrarily-aligned memcpy(), but I
don't know what the best way to fix it is. Either way I've satisfied myself
that it's not an issue with the arm64 code itself - I do wonder whether this
might also be a problem on IA-64 given ACPI_MISALIGNMENT_NOT_SUPPORTED, and
I guess RISC-V may have alignment concerns as well.
Yes agreed but see above, this code has been there for aeons if it
is a v5.13-rc* regression it must be something else that actually
triggered it (test/FW config).
Thanks for looking into this.
Lorenzo