> > I have mixed feelings about this. > > One one hand, this looks simple enough. > > But on the other hand we have other users of memcpy_fromio(), including > SOF drivers, so what are the odds we have the same problems in other > places? Wouldn't it be safer to either change this function so that it's > behavior is not ambiguous or compiler-dependent, or fix the compiler? > Hi Pierre and Amadeusz, I have to admit that I didn't dig into clang's __builtin_memcpy to see what's happening inside so I don't have direct evidence to say it's clang's problem. What I know is kernel built by clang10 works fine but have this issue once changed to clang11. At first I also suspect that it's a timing issue so I checked the command transaction. The transaction is simple, host writes command in SST_IPCX register, the DSP then writes reply in SST_IPCD register and trigger an interrupt. Finally the irq thread sst_byt_irq_thread() reads the SST_IPCD register to complete the transaction. I added some debug messages to see if there is something wrong in the transaction but it all looks good. I am also confused that why this only happens to BYT but not BDW since they share the same register accessing code in sst-dsp.c. I checked the code and realized that in BDW, the irq thread (hsw_irq_thread) performs 32-bit register read instead of 64-bit in BYT platform. Therefore I change the code in BYT to use two readl() calls and found the problem is gone. My best guess is it's related to the implementation of __builtin_memcpy() but not sure it's the timing or implementing cause this problem. Regards, Brent