Thanks for your reply, Christophe, I will use 'sparse' tool for checking unsafe IO memory access, I guess it is powerful. Thanks again ! -----邮件原件----- 发件人: Christophe Leroy [mailto:christophe.leroy@xxxxxx] 发送时间: 2019年11月26日 16:16 收件人: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx> 抄送: linux-arch@xxxxxxxxxxxxxxx; chengjian (D) <cj.chengjian@xxxxxxxxxx>; Libin (Huawei) <huawei.libin@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>; zhangyi (F) <yi.zhang@xxxxxxxxxx>; Liuwenliang (Abbott Liu) <liuwenliang@xxxxxxxxxx> 主题: Re: 答复: 答复: loop nesting in alignment exception and machine check Le 14/11/2019 à 04:46, Wangshaobo (bobo) a écrit : > Hi Christophe, > It testifys problem fixed when we use memcpy_toio() instead of memcpy > In our practice, we found everything is ok before the cache_memcpy > becomes memcpy in the Patch 0b05e2d671c40cfb57e66e4e402320d6e056b2f8 adopted, it accelerates the memcpy but introduces implicit trouble, our products commonly used memcpy for continuous matainance for a long time , but now those become a big problem for us to check where we use is correct and where is wrong, with respect to cachable_memcpy and memcpy_toio. > So, I also want to ask, > how can we trustly and unified fill the gap resulted by those changes in memcpy in version mantainance, if you have some tips pls tell me. > Tthanks, your Shaobo Wang All accesses to I/O memory should use io accessors. Direct access to io memory is unsafe by definition. Incorrect accesses to I/O memory can be detected with 'sparse' tool. For that, you just have to build the kernel with 'make vmlinux C=2' and you'll get notified for unsafe accesses to IO memory. Christophe > > -----邮件原件----- > 发件人: Christophe Leroy [mailto:christophe.leroy@xxxxxx] > 发送时间: 2019年10月31日 19:13 > 收件人: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx> > 抄送: chengjian (D) <cj.chengjian@xxxxxxxxxx>; Libin (Huawei) > <huawei.libin@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>; zhangyi (F) > <yi.zhang@xxxxxxxxxx> > 主题: Re: 答复: loop nesting in alignment exception and machine check > > Hi, > > Did you try ? Does it work ? > > Christophe > > Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit : >> Hi,Christophe >> >> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy(). >> >> -----邮件原件----- >> 发件人: Christophe Leroy [mailto:christophe.leroy@xxxxxx] >> 发送时间: 2019年10月26日 19:20 >> 收件人: Wangshaobo (bobo) <bobo.shaobowang@xxxxxxxxxx> >> 抄送: linux-arch@xxxxxxxxxxxxxxx; alistair@xxxxxxxxxxxx; chengjian (D) >> <cj.chengjian@xxxxxxxxxx>; Xiexiuqi <xiexiuqi@xxxxxxxxxx>; >> linux-kernel@xxxxxxxxxxxxxxx; oss@xxxxxxxxxxxx; paulus@xxxxxxxxx; >> Libin (Huawei) <huawei.libin@xxxxxxxxxx>; agust@xxxxxxx; >> linuxppc-dev@xxxxxxxxxxxxxxxx >> 主题: Re: loop nesting in alignment exception and machine check >> >> Hi, >> >> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit : >>> Hi, >>> >>> I encountered a problem about a loop nesting occurred in >>> manufacturing the alignment exception in machine check, trigger background is : >>> >>> problem: >>> >>> machine checkout or critical interrupt ->…->kbox_write[for recording >>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)… >>> >>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment >>> exception, in this situation,r11 loads the ioremap address,which >>> leads to the alignment exception, >> >> You can't use memcpy() on something else than memory. >> >> For an ioremapped area, you have to use memcpy_toio() >> >> Christophe >> >>> >>> then the command can not be process successfully,as we still in >>> machine check.at the end ,it triggers a new irq machine check in irq >>> handler function,a loop nesting begins. >>> >>> analysis: >>> >>> We have analysed a lot,but it still can not come to a reasonable >>> description,in common,the alignment triggered in machine check >>> context can still be collected into the Kbox >>> >>> after alignment exception be handled by handler function, but how >>> does the machine checkout can be triggered in the handler fucntion >>> for any causes? We print relevant registers >>> >>> as follow when first enter machine check and alignment exception >>> handler >>> function: >>> >>> MSR:0x2 MSR:0x0 >>> >>> SRR1:0x2 SRR1:0x21002 >>> >>> But the manual says SRR1 should be set to MSR(0x2),why >>> that happened ? >>> >>> Then a branch in handler function copy the SRR1 to >>> MSR,this enble MSR[ME] and MSR[CE],system collapses. >>> >>> Conclusion: >>> >>> 1) why the alignment exception can not be handled in >>> machine check ? >>> >>> 2) besides memcpy,any other function can cause the >>> alignment exception ? >>> >>> We still recurrent it, the line as follows: >>> >>> Cpu dead lock->watch log->trigger >>> fiq->kbox_write->memcpy->alignment exception->print last words. >>> >>> but for those problems as below,what the kbox printed is empty. >>> >>> ------------------kbox restart:[ 10.147594]---------------- >>> >>> kbox verify fs magic fail >>> >>> kbox mem mabye destroyed, format it >>> >>> kbox: load OK >>> >>> lock-task: major[249] minor[0] >>> >>> -----start show_destroyed_kbox_mem_head---- >>> >>> 00000000: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000010: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000020: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000030: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000040: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000050: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000060: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000070: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000080: 00000000 00000000 00000000 00000000 ................ >>> >>> 00000090: 00000000 00000000 00000000 00000000 ................ >>>