On 2020-06-18 17:00, Mike Christie wrote:
On 6/18/20 8:16 AM, Bodo Stroesser wrote:
This patch fixes the following crash
(see
https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=208045__;!!GqivPVa7Brio!O7JgZIL3VPAzIqwJfZjL48y8M90K3HXv3pUVeoCzZ-vXovCpx5g7xMg-z5aAiVXVtkfE$
)
Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic
#202004230533
Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
pstate: 80400005 (Nzcv daif +PAN -UAO)
pc : flush_dcache_page+0x18/0x40
lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
sp : ffff000015123a80
x29: ffff000015123a80 x28: 0000000000000000
x27: 0000000000001000 x26: ffff000023ea5000
x25: ffffcfa25bbe08b8 x24: 0000000000000078
x23: ffff7e0000000000 x22: ffff000023ea5001
x21: ffffcfa24b79c000 x20: 0000000000000fff
x19: ffff7e00008fa940 x18: 0000000000000000
x17: 0000000000000000 x16: ffff2d047e709138
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: ffff2d047fbd0a40
x11: 0000000000000000 x10: 0000000000000030
x9 : 0000000000000000 x8 : ffffc9a254820a00
x7 : 00000000000013b0 x6 : 000000000000003f
x5 : 0000000000000040 x4 : ffffcfa25bbe08e8
x3 : 0000000000001000 x2 : 0000000000000078
x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18
Call trace:
flush_dcache_page+0x18/0x40
is_ring_space_avail+0x68/0x2f8 [target_core_user]
queue_cmd_ring+0x1f8/0x680 [target_core_user]
tcmu_queue_cmd+0xe4/0x158 [target_core_user]
__target_execute_cmd+0x30/0xf0 [target_core_mod]
target_execute_cmd+0x294/0x390 [target_core_mod]
transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
kthread+0x130/0x138
ret_from_fork+0x10/0x18
Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260)
---[ end trace 1e451c73f4266776 ]---
The solution is based on patch:
"scsi: target: tcmu: Optimize use of flush_dcache_page"
which restricts the use of tcmu_flush_dcache_range() to
addresses from vmalloc'ed areas only.
This patch now replaces the virt_to_page() call in
tcmu_flush_dcache_range() - which is wrong for vmalloced addrs -
by vmalloc_to_page().
The patch was tested on ARM with kernel 4.19.118 and 5.7.2
Signed-off-by: Bodo Stroesser <bstroesser@xxxxxxxxxxxxxx>
Tested-by: JiangYu <lnsyyj@xxxxxxxxxxx>
Tested-by: Daniel Meyerholt <dxm523@xxxxxxxxx>
---
drivers/target/target_core_user.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/target/target_core_user.c
b/drivers/target/target_core_user.c
index a65e9671ae7a..835d3001cb0e 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -601,7 +601,7 @@ static inline void tcmu_flush_dcache_range(void
*vaddr, size_t size)
size = round_up(size+offset, PAGE_SIZE);
while (size) {
- flush_dcache_page(virt_to_page(start));
+ flush_dcache_page(vmalloc_to_page(start));
start += PAGE_SIZE;
size -= PAGE_SIZE;
}
For this bug we only need to change the flush in is_ring_space_avail
right? It's what is accessing the mb which is vmalloc'd.
No, is_ring_space_avail was just the first caller of
tcmu_flush_dcache_range for vmalloc'ed pages, so it crashed there.
The entiere address range exposed to userspace via uio consists of
two parts:
1) mb + command ring are vmalloc'ed in a single vzalloc call
during initialization of a tcmu device.
2) the data area which is allocated page by page calling
alloc_page()
The second part is handled by (scatter|gather)_data_area. For the
calls from these routines I think usage of virt_to_page in
tcmu_flush_dcache_range was fine. But patch number 1 of this
series replaced these called with direct calls to flush_dcache_page.
So all remaining calls to tcmu_flush_dcache_range after patch 1
are calls for vmalloc'ed addresses. Therefore after patch 1 we can
safely replace virt_to_page with vmalloc_to_page.
So it turned out that patch 1, which in the beginning was thought
to be an optimization, now also simplifies the crash fix.
Is it reccommended to call vmalloc_to_page on a page we get with
alloc_page and is the mm then always going to do the right thing for us
(I do not know the mm code well and just quickly scanned the
Documentation and comments, but could not find anything), or could we
hit similar issues where we are using the wrong call on different types
of allocated memory down the line?