答复: 答复: 答复: loop nesting in alignment exception and machine check

"Wangshaobo (bobo)" <bobo.shaobowang@xxxxxxxxxx> · Tue, 26 Nov 2019 12:25:25 +0000

Thanks for your reply, Christophe,

I will use 'sparse' tool for checking unsafe IO memory access, I guess it is powerful.

Thanks again !
-----邮件原件-----
发件人: Christophe Leroy [mailto:christophe.leroy@xxxxxx] 
发送时间: 2019年11月26日 16:16
收件人: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx>
抄送: linux-arch@xxxxxxxxxxxxxxx; chengjian (D) <cj.chengjian@xxxxxxxxxx>; Libin (Huawei) <huawei.libin@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>; zhangyi (F) <yi.zhang@xxxxxxxxxx>; Liuwenliang (Abbott Liu) <liuwenliang@xxxxxxxxxx>
主题: Re: 答复: 答复: loop nesting in alignment exception and machine check

Le 14/11/2019 à 04:46, Wangshaobo (bobo) a écrit :
> Hi Christophe,
> 	It testifys problem fixed when we use memcpy_toio() instead of memcpy 
> In our practice, we found everything is ok before the cache_memcpy 
> becomes memcpy in the Patch 0b05e2d671c40cfb57e66e4e402320d6e056b2f8 adopted, it accelerates the memcpy but introduces implicit trouble, our products commonly used memcpy for continuous matainance for a long time , but now those become a big problem for us to check where we use is correct and where is wrong, with respect to cachable_memcpy and memcpy_toio.
> 	So, I also want to ask,
> 	how can we trustly and unified fill the gap resulted by those changes in memcpy in version mantainance, if you have some tips pls tell me.
> 	Tthanks, your Shaobo Wang

All accesses to I/O memory should use io accessors. Direct access to io memory is unsafe by definition.

Incorrect accesses to I/O memory can be detected with 'sparse' tool. For that, you just have to build the kernel with 'make vmlinux C=2' and you'll get notified for unsafe accesses to IO memory.

Christophe

> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@xxxxxx]
> 发送时间: 2019年10月31日 19:13
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx>
> 抄送: chengjian (D) <cj.chengjian@xxxxxxxxxx>; Libin (Huawei) 
> <huawei.libin@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>; zhangyi (F) 
> <yi.zhang@xxxxxxxxxx>
> 主题: Re: 答复: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Did you try ? Does it work ?
> 
> Christophe
> 
> Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
>> Hi,Christophe
>>
>> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
>>
>> -----邮件原件-----
>> 发件人: Christophe Leroy [mailto:christophe.leroy@xxxxxx]
>> 发送时间: 2019年10月26日 19:20
>> 收件人: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx>
>> 抄送: linux-arch@xxxxxxxxxxxxxxx; alistair@xxxxxxxxxxxx; chengjian (D) 
>> <cj.chengjian@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>; 
>> linux-kernel@xxxxxxxxxxxxxxx; oss@xxxxxxxxxxxx; paulus@xxxxxxxxx; 
>> Libin (Huawei) <huawei.libin@xxxxxxxxxx>; agust@xxxxxxx; 
>> linuxppc-dev@xxxxxxxxxxxxxxxx
>> 主题: Re: loop nesting in alignment exception and machine check
>>
>> Hi,
>>
>> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>>> Hi,
>>>
>>> I encountered a problem about a loop nesting occurred in 
>>> manufacturing the alignment exception in machine check, trigger background is :
>>>
>>> problem:
>>>
>>> machine checkout or critical interrupt ->…->kbox_write[for recording 
>>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>>
>>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
>>> exception, in this situation,r11 loads the ioremap address,which 
>>> leads to the alignment exception,
>>
>> You can't use memcpy() on something else than memory.
>>
>> For an ioremapped area, you have to use memcpy_toio()
>>
>> Christophe
>>
>>>
>>> then the command can not be process successfully,as we still in 
>>> machine check.at the end ,it triggers a new irq machine check in irq 
>>> handler function,a loop nesting begins.
>>>
>>> analysis:
>>>
>>> We have analysed a lot,but it still can not come to a reasonable 
>>> description,in common,the alignment triggered in machine check 
>>> context can still be collected into the Kbox
>>>
>>> after alignment exception be handled by handler function, but how 
>>> does the machine checkout can be triggered in the handler fucntion 
>>> for any causes? We print relevant registers
>>>
>>> as follow when first enter machine check and alignment exception 
>>> handler
>>> function:
>>>
>>>             MSR:0x2      MSR:0x0
>>>
>>>             SRR1:0x2      SRR1:0x21002
>>>
>>>             But the manual says SRR1 should be set to MSR(0x2),why 
>>> that happened ?
>>>
>>>             Then a branch in handler function copy the SRR1 to 
>>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>>
>>> Conclusion:
>>>
>>>             1)  why the alignment exception can not be handled in 
>>> machine check ?
>>>
>>>             2)  besides memcpy,any other function can cause the 
>>> alignment exception ?
>>>
>>> We still recurrent it, the line as follows:
>>>
>>>             Cpu dead lock->watch log->trigger
>>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>>
>>>             but for those problems as below,what the kbox printed is empty.
>>>
>>> ------------------kbox restart:[   10.147594]----------------
>>>
>>> kbox verify fs magic fail
>>>
>>> kbox mem mabye destroyed, format it
>>>
>>> kbox: load OK
>>>
>>> lock-task: major[249] minor[0]
>>>
>>> -----start show_destroyed_kbox_mem_head----
>>>
>>> 00000000: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000010: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000020: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000030: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000040: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000050: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000060: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000070: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000080: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000090: 00000000 00000000 00000000 00000000  ................
>>>