Hi Kiwoong,
On 2021-06-14 17:52, Kiwoong Kim wrote:
Dear All
I saw one symptom and started wondering on how a command context is
synchronized between ufs and scsi.
In the situation where the following log happened, the lrb structure
for tag 10 didn't have a command context.
That is, lrbp->cmd was null, so it led to this kernel panic.
lrbp->cmd is set when a command is issued, and cleared when the
command is completed.
But what if the command is timed-out and it's completed because its
response comes in at the same time?
If scsi added it into its error command list and wakes-up scsi_eh
though the command is actually completed, scsi_eh will invoke
eh_abort_handler and the symptom will be duplicated, I think
Otherwise, is there anyone who know how to guarantee the coherency?
[78843.058729] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs:
ufshcd_abort: cmd was completed, but without a notifying intr, tag =
10
[78843.058775] [3: kworker/u16:1:27018] exynos-ufs 13100000.ufs:
ufshcd_abort: Device abort task at tag 10
[78843.058793] [3: kworker/u16:1:27018] Unable to handle kernel NULL
pointer dereference at virtual address 0000000000000160
..
[78843.075421] [3: kworker/u16:1:27018] pc :
scsi_print_command+0x24/0x340
[78843.075436] [3: kworker/u16:1:27018] lr : ufshcd_abort+0x180/0x674
[78843.075444] [3: kworker/u16:1:27018] sp : ffffffc038ea3c00
[78843.075453] [3: kworker/u16:1:27018] x29: ffffffc038ea3c10 x28:
0000000000000400
[78843.075464] [3: kworker/u16:1:27018] x27: ffffff8934c0a680 x26:
ffffff8931560000
[78843.075474] [3: kworker/u16:1:27018] x25: 000000000002000a x24:
ffffff88a0dd4910
[78843.075485] [3: kworker/u16:1:27018] x23: 0000000000000000 x22:
ffffff8930f258f0
[78843.075495] [3: kworker/u16:1:27018] x21: ffffff8934c0a080 x20:
000000000000000a
[78843.075505] [3: kworker/u16:1:27018] x19: ffffff8931560cf8 x18:
ffffffc037557030
[78843.075516] [3: kworker/u16:1:27018] x17: 0000000000000000 x16:
ffffffc010eeba70
[78843.075526] [3: kworker/u16:1:27018] x15: ffffffc01187d88f x14:
2067617420746120
[78843.075536] [3: kworker/u16:1:27018] x13: 6b7361742074726f x12:
6261206563697665
[78843.075546] [3: kworker/u16:1:27018] x11: 44203a74726f6261 x10:
00000000ffffffff
[78843.075556] [3: kworker/u16:1:27018] x9 : 0000000000000090 x8 :
ffffff8934c0a620
[78843.075566] [3: kworker/u16:1:27018] x7 : 0000000000000000 x6 :
ffffffc0102a7d6c
[78843.075576] [3: kworker/u16:1:27018] x5 : 0000000000000000 x4 :
0000000000000080
[78843.075585] [3: kworker/u16:1:27018] x3 : 0000000000000000 x2 :
ffffffc0102a7d80
[78843.075595] [3: kworker/u16:1:27018] x1 : ffffffc0102a7d80 x0 :
0000000000000000
[78843.075606] [3: kworker/u16:1:27018] Call trace:
[78843.075617] [3: kworker/u16:1:27018] scsi_print_command+0x24/0x340
[78843.075627] [3: kworker/u16:1:27018] ufshcd_abort+0x180/0x674
[78843.075643] [3: kworker/u16:1:27018]
scmd_eh_abort_handler+0x80/0x15c
[78843.075660] [3: kworker/u16:1:27018] process_one_work+0x290/0x4e4
[78843.075669] [3: kworker/u16:1:27018] worker_thread+0x258/0x534
[78843.075681] [3: kworker/u16:1:27018] kthread+0x178/0x188
[78843.075696] [3: kworker/u16:1:27018] ret_from_fork+0x10/0x18
In 5.13 kernel, it is scsi_print_command(cmd) in ufshcd_abort(),
while in 5.12 and earlier kernel, it is
scsi_print_command(hba->lrb[tag].cmd).
Which kernel are you using here?
Thanks,
Can Guo.
Thanks.
Kiwoong Kim