Hi James, > -----Original Message----- > From: James Clark <james.clark@xxxxxxx> > Sent: Tuesday, September 19, 2023 6:41 PM > To: Linu Cherian <lcherian@xxxxxxxxxxx>; suzuki.poulose@xxxxxxx; > mike.leach@xxxxxxxxxx; leo.yan@xxxxxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; coresight@xxxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; robh+dt@xxxxxxxxxx; > krzysztof.kozlowski+dt@xxxxxxxxxx; conor+dt@xxxxxxxxxx; > devicetree@xxxxxxxxxxxxxxx; Sunil Kovvuri Goutham > <sgoutham@xxxxxxxxxxx>; George Cherian <gcherian@xxxxxxxxxxx> > Subject: Re: [EXT] Re: [RFC PATCH v3 0/8] Coresight for Kernel panic and > watchdog reset > > > > On 19/09/2023 12:39, Linu Cherian wrote: > > Hi James, > > > >> -----Original Message----- > >> From: James Clark <james.clark@xxxxxxx> > >> Sent: Friday, September 15, 2023 7:20 PM > >> To: Linu Cherian <lcherian@xxxxxxxxxxx>; suzuki.poulose@xxxxxxx; > >> mike.leach@xxxxxxxxxx; leo.yan@xxxxxxxxxx > >> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; coresight@xxxxxxxxxxxxxxxx; > >> linux- kernel@xxxxxxxxxxxxxxx; robh+dt@xxxxxxxxxx; > >> krzysztof.kozlowski+dt@xxxxxxxxxx; conor+dt@xxxxxxxxxx; > >> devicetree@xxxxxxxxxxxxxxx; Sunil Kovvuri Goutham > >> <sgoutham@xxxxxxxxxxx>; George Cherian <gcherian@xxxxxxxxxxx> > >> Subject: [EXT] Re: [RFC PATCH v3 0/8] Coresight for Kernel panic and > >> watchdog reset > >> > >> External Email > >> > >> --------------------------------------------------------------------- > >> - > >> > >> > >> On 04/09/2023 06:05, Linu Cherian wrote: > >>> This RFC v3 patch series is rebased on v6.5-rc7 and is dependent on > >>> the below two patches. > >> [...] > >>> > >>> Steps for reading trace data captured in previous boot > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> 1. cd /sys/bus/coresight/devices/tmc_etrXX/ > >>> > >>> 2. Change to special mode called, read_prevboot. > >>> > >>> #echo 1 > read_prevboot > >>> > >>> 3. Dump trace buffer data to a file, > >>> > >>> #dd if=/dev/tmc_etrXX of=~/cstrace.bin > >> > >> Hi Linu, > >> > >> I left this comment on V2, but I tested it again and get the same result. > >> Instead of linking it I'll just re-paste it here: > >> > >> I made a reserved region, but when I run this command I get "Unable > >> to handle kernel paging request at virtual address 001f1921ed10ffae". > >> > >> Is there an extra step involved if there was no trace captured from a > >> previous panic? I thought I'd just be able to read out uninitialised > >> data. Or is it the uninitialised metadata that's causing this issue? > >> > >> Also that's without KASAN or lockdep turned on. If I have a kernel > >> with either of those things I get a different warning for each one. I > >> expect the lockdep one would happen even in the working scenario > though? > > > > Somehow I missed this comment on V2. > > > > I retried the above steps on my board and I do not see issues either with > KASAN OR lockdep enabled configs. > > Please see logs below. > > > > a. Lockdep enabled config > > ~# cd /sys/bus/coresight/devices/tmc_etr0 > > tmc_etr0# echo 1 > read_prevboot > > tmc_etr0# dd if=/dev/tmc_etr0 of=~/cstrace.bin > > 12324+1 records in > > 12324+1 records out > > 6310032 bytes (6.3 MB, 6.0 MiB) copied, 0.122883 s, 51.3 MB/s > > > > # zcat /proc/config.gz | grep LOCKDEP > > CONFIG_LOCKDEP_SUPPORT=y > > CONFIG_LOCKDEP=y > > CONFIG_LOCKDEP_BITS=15 > > CONFIG_LOCKDEP_CHAINS_BITS=16 > > CONFIG_LOCKDEP_STACK_TRACE_BITS=19 > > CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14 > > CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12 > > # CONFIG_DEBUG_LOCKDEP is not set > > > > b. KASAN enabled config > > # cd /sys/bus/coresight/devices/tmc_etr0/ > > tmc_etr0# ls > > buf_mode_preferred connections power trigger_cntr > > buf_modes_available enable_sink read_prevboot uevent > > buffer_size mgmt subsystem waiting_for_supplier > > tmc_etr0# echo 1 > read_prevboot > > tmc_etr0# dd if=/dev/tmc_etr0 of=~/cstrace.bin > > 12324+1 records in > > 12324+1 records out > > 6310032 bytes (6.3 MB, 6.0 MiB) copied, 0.0940671 s, 67.1 MB/s > > > > ~# zcat /proc/config.gz | grep -i kasan > > CONFIG_KASAN_SHADOW_OFFSET=0xdfff800000000000 > > CONFIG_HAVE_ARCH_KASAN=y > > CONFIG_HAVE_ARCH_KASAN_SW_TAGS=y > > CONFIG_HAVE_ARCH_KASAN_HW_TAGS=y > > CONFIG_HAVE_ARCH_KASAN_VMALLOC=y > > CONFIG_CC_HAS_KASAN_GENERIC=y > > CONFIG_CC_HAS_KASAN_SW_TAGS=y > > CONFIG_KASAN=y > > CONFIG_KASAN_GENERIC=y > > # CONFIG_KASAN_SW_TAGS is not set > > # CONFIG_KASAN_HW_TAGS is not set > > CONFIG_KASAN_OUTLINE=y > > # CONFIG_KASAN_INLINE is not set > > CONFIG_KASAN_STACK=y > > CONFIG_KASAN_VMALLOC=y > > # CONFIG_KASAN_MODULE_TEST is not set > > > > > > But then I am able to trigger kernel crash with bad metadata(corrupted rwp > and rrp) with below stack trace. > > > > [ 107.442991] __arch_copy_to_user+0x180/0x240 [ 107.447254] > > vfs_read+0xc8/0x2a8 [ 107.450476] ksys_read+0x74/0x110 [ > > 107.453783] __arm64_sys_read+0x24/0x38 [ 107.457611] > > invoke_syscall.constprop.0+0x58/0xf8 > > [ 107.462309] do_el0_svc+0x6c/0x158 > > [ 107.465704] el0_svc+0x54/0x1c0 > > [ 107.468839] el0t_64_sync_handler+0x100/0x130 [ 107.473188] > > el0t_64_sync+0x190/0x198 [ 107.476843] Code: d503201f d503201f > > d503201f d503201f (a8c12027) > > > > Does your stack trace looks similar ? Then its very likely due to bad > metadata. > > If not, kindly please share yours. > > > > For example, if we have bad values for rwp and rrp, offset can get messed > up resulting in above crash. > > Will add more validation checks while setting up the prevboot buffer, > > so as to avoid processing with bogus metadata values in the next patch > version. > > > > Thanks James for trying this out. > > > > > > I think it must be bad metadata because I didn't try it with a previous crash > saved yet. I suppose we do need some kind of validation then if it's possible > for bad metadata to cause a crash. > > I will try after filling in the metadata and see if that was the issue. Found a regression in this series. Below is the fix. @@ -1840,7 +1853,7 @@ static int tmc_panic_sync_etr(struct coresight_device *csdev) /* Sync registers from hardware to metadata region */ tmc->size = csdev_access_relaxed_read32(csa, TMC_RSZ); - tmc->sts = csdev_access_relaxed_read32(csa, TMC_RSZ); + tmc->sts = csdev_access_relaxed_read32(csa, TMC_STS); > > > >> > >>> > >>> 4. Reset back to normal mode > >>> > >>> #echo 0 > read_prevboot > >>>