Re: 💥 PANICKED: Test report for kernel 5.9.0-rc3-020ad03.cki (block)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/3/20 1:58 PM, Veronika Kabatova wrote:
> 
> 
> ----- Original Message -----
>> From: "Rachel Sibley" <rasibley@xxxxxxxxxx>
>> To: "Jens Axboe" <axboe@xxxxxxxxx>, "CKI Project" <cki-project@xxxxxxxxxx>, linux-block@xxxxxxxxxxxxxxx
>> Cc: "Changhui Zhong" <czhong@xxxxxxxxxx>
>> Sent: Thursday, September 3, 2020 8:59:48 PM
>> Subject: Re: 💥 PANICKED: Test report for kernel 5.9.0-rc3-020ad03.cki (block)
>>
>>
>>
>> On 9/3/20 1:46 PM, Jens Axboe wrote:
>>> On 9/3/20 11:10 AM, Rachel Sibley wrote:
>>>>
>>>> On 9/3/20 1:07 PM, CKI Project wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We ran automated tests on a recent commit from this kernel tree:
>>>>>
>>>>>          Kernel repo:
>>>>>          https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>>>>>               Commit: 020ad0333b03 - Merge branch 'for-5.10/block' into
>>>>>               for-next
>>>>>
>>>>> The results of these automated tests are provided below.
>>>>>
>>>>>       Overall result: FAILED (see details below)
>>>>>                Merge: OK
>>>>>              Compile: OK
>>>>>                Tests: PANICKED
>>>>>
>>>>> All kernel binaries, config files, and logs are available for download
>>>>> here:
>>>>>
>>>>>     https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=datawarehouse/2020/09/02/613166
>>>>>
>>>>> One or more kernel tests failed:
>>>>>
>>>>>       ppc64le:
>>>>>        💥 storage: software RAID testing
>>>>>
>>>>>       aarch64:
>>>>>        💥 storage: software RAID testing
>>>>>
>>>>>       x86_64:
>>>>>        💥 storage: software RAID testing
>>>>
>>>> Hello,
>>>>
>>>> We're seeing a panic for all non s390x arches triggered by swraid test.
>>>> Seems to be reproducible
>>>> for all succeeding pipelines after this one, and we haven't yet seen it in
>>>> mainline or yesterday's
>>>> block tree results.
>>>>
>>>> Thank you,
>>>> Rachel
>>>>
>>>> https://cki-artifacts.s3.us-east-2.amazonaws.com/datawarehouse/2020/09/02/613166/build_aarch64_redhat%3A968098/tests/8757835_aarch64_3_console.log
>>>>
>>>> [ 8394.609219] Internal error: Oops: 96000004 [#1] SMP
>>>> [ 8394.614070] Modules linked in: raid0 loop raid456 async_raid6_recov
>>>> async_memcpy async_pq async_xor async_tx dm_log_writes dm_flakey
>>>> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
>>>> rfkill sunrpc vfat fat xgene_hwmon xgene_enet at803x mdio_xgene xgene_rng
>>>> xgene_edac mailbox_xgene_slimpro drm ip_tables xfs sdhci_of_arasan
>>>> sdhci_pltfm i2c_xgene_slimpro crct10dif_ce sdhci gpio_dwapb cqhci
>>>> xhci_plat_hcd
>>>> gpio_xgene_sb gpio_keys aes_neon_bs
>>>> [ 8394.654298] CPU: 3 PID: 471427 Comm: kworker/3:2 Kdump: loaded Not
>>>> tainted 5.9.0-rc3-020ad03.cki #1
>>>> [ 8394.663299] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene
>>>> Mustang Board, BIOS 3.06.25 Oct 17 2016
>>>> [ 8394.672999] Workqueue: md_misc mddev_delayed_delete
>>>> [ 8394.677853] pstate: 40400085 (nZcv daIf +PAN -UAO BTYPE=--)
>>>> [ 8394.683399] pc : percpu_ref_exit+0x5c/0xc8
>>>> [ 8394.687473] lr : percpu_ref_exit+0x20/0xc8
>>>> [ 8394.691547] sp : ffff800019f33d00
>>>> [ 8394.694843] x29: ffff800019f33d00 x28: 0000000000000000
>>>> [ 8394.700129] x27: ffff0003c63ae000 x26: ffff8000120b6228
>>>> [ 8394.705414] x25: 0000000000000001 x24: ffff0003d8322a80
>>>> [ 8394.710698] x23: 0000000000000000 x22: 0000000000000000
>>>> [ 8394.715983] x21: 0000000000000000 x20: ffff8000121d2000
>>>> [ 8394.721266] x19: ffff0003d8322af0 x18: 0000000000000000
>>>> [ 8394.726550] x17: 0000000000000000 x16: 0000000000000000
>>>> [ 8394.731834] x15: 0000000000000007 x14: 0000000000000003
>>>> [ 8394.737119] x13: 0000000000000000 x12: ffff0003888a1978
>>>> [ 8394.742403] x11: ffff0003888a1918 x10: 0000000000000001
>>>> [ 8394.747688] x9 : 0000000000000000 x8 : 0000000000000000
>>>> [ 8394.752972] x7 : 0000000000000400 x6 : 0000000000000001
>>>> [ 8394.758257] x5 : ffff800010423030 x4 : ffff8000121d2e40
>>>> [ 8394.763540] x3 : 0000000000000000 x2 : 0000000000000000
>>>> [ 8394.768825] x1 : 0000000000000000 x0 : 0000000000000000
>>>> [ 8394.774110] Call trace:
>>>> [ 8394.776544]  percpu_ref_exit+0x5c/0xc8
>>>> [ 8394.780273]  md_free+0x64/0xa0
>>>> [ 8394.783311]  kobject_put+0x7c/0x218
>>>> [ 8394.786781]  mddev_delayed_delete+0x3c/0x50
>>>> [ 8394.790944]  process_one_work+0x1c4/0x450
>>>> [ 8394.794932]  worker_thread+0x164/0x4a8
>>>> [ 8394.798662]  kthread+0xf4/0x120
>>>> [ 8394.801787]  ret_from_fork+0x10/0x18
>>>> [ 8394.805344] Code: 2a0403e0 350002c0 a9400262 52800001 (f9400000)
>>>> [ 8394.811407] ---[ end trace 481cab6e1ad73da1 ]---
>>>
>>> Ming, I wonder if this is:
>>>
>>> commit d0c567d60f3730b97050347ea806e1ee06445c78
>>> Author: Ming Lei <ming.lei@xxxxxxxxxx>
>>> Date:   Wed Sep 2 20:26:42 2020 +0800
>>>
>>>      percpu_ref: reduce memory footprint of percpu_ref in fast path
>>>
>>> Rachel, any chance you can do a run with that commit reverted?
>>
>> Hi Jens, yes we're working on it and will share our findings as soon as the
>> job finishes.
>>
> 
> Hi Jens, we can confirm that there are no panics and the test passes
> with the patch reverted.
> 
> 
> We also realized that this patch is a likely cause of serious problems
> on ppc64le during LTP testing as well, specifically msgstress04. Both
> issues started occurring at the same time, we just didn't notice as the
> test was crashing.
> 
> 
> [ 5682.999169] msgstress04 invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=0, oom_score_adj=0 
> [ 5682.999981] CPU: 1 PID: 170909 Comm: msgstress04 Kdump: loaded Not tainted 5.9.0-rc3-020ad03.cki #1 
> [ 5683.000048] Call Trace: 
> [ 5683.000098] [c00000023de972e0] [c000000000927e00] dump_stack+0xc4/0x114 (unreliable) 
> [ 5683.000161] [c00000023de97330] [c000000000386958] dump_header+0x64/0x274 
> [ 5683.000205] [c00000023de973c0] [c000000000385534] oom_kill_process+0x284/0x290 
> [ 5683.000259] [c00000023de97400] [c0000000003862b0] out_of_memory+0x220/0x790 
> [ 5683.000307] [c00000023de974a0] [c000000000408890] __alloc_pages_slowpath.constprop.0+0xd60/0xeb0 
> [ 5683.000370] [c00000023de97670] [c000000000408d20] __alloc_pages_nodemask+0x340/0x400 
> [ 5683.000426] [c00000023de97700] [c000000000434dec] alloc_pages_current+0xac/0x130 
> [ 5683.000479] [c00000023de97750] [c000000000442fc4] allocate_slab+0x584/0x810 
> [ 5683.000525] [c00000023de977c0] [c000000000447e7c] ___slab_alloc+0x44c/0xa30 
> [ 5683.000571] [c00000023de978b0] [c000000000448494] __slab_alloc+0x34/0x60 
> [ 5683.000615] [c00000023de978e0] [c000000000448b48] kmem_cache_alloc+0x688/0x700 
> [ 5683.000671] [c00000023de97940] [c0000000003d9c80] __pud_alloc+0x70/0x1e0 
> [ 5683.000717] [c00000023de97990] [c0000000003ddbb4] copy_page_range+0x1204/0x1490 
> [ 5683.000779] [c00000023de97b20] [c00000000013b7c0] dup_mm+0x370/0x6e0 
> [ 5683.000826] [c00000023de97bd0] [c00000000013ce10] copy_process+0xd20/0x1950 
> [ 5683.000870] [c00000023de97c90] [c00000000013dc64] _do_fork+0xa4/0x560 
> [ 5683.000915] [c00000023de97d00] [c00000000013e24c] __do_sys_clone+0x7c/0xa0 
> [ 5683.000965] [c00000023de97dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0 
> [ 5683.001019] [c00000023de97e20] [c00000000000d140] system_call_common+0xf0/0x27c 
> 
> The test then manages the fill the console log with good 4G of dump...
> this is actually visible in the ppc64le console log from the linked
> artifacts (warnings, it's a huge file!):
> 
> https://cki-artifacts.s3.us-east-2.amazonaws.com/datawarehouse/2020/09/02/613166/build_ppc64le_redhat%3A968099/tests/8757368_ppc64le_3_console.log
> 
> 
> There are also more ppc64le traces in the other log (of reasonable size):
> https://cki-artifacts.s3.us-east-2.amazonaws.com/datawarehouse/2020/09/02/613166/build_ppc64le_redhat%3A968099/tests/8757337_ppc64le_2_console.log

I'll revert this change for now.

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux