On Mon Feb 19, 2024 at 4:59 PM AEST, Thomas Huth wrote: > On 17/02/2024 11.43, Nicholas Piggin wrote: > > On Sat Feb 17, 2024 at 12:02 AM AEST, Thomas Huth wrote: > >> getchar() can currently only be called once on arm since the implementation > >> is a little bit too naïve: After the first character has arrived, the > >> data register never gets set to zero again. To properly check whether a > >> byte is available, we need to check the "RX fifo empty" on the pl011 UART > >> or the "RX data ready" bit on the ns16550a UART instead. > >> > >> With this proper check in place, we can finally also get rid of the > >> ugly assert(count < 16) statement here. > >> > >> Signed-off-by: Thomas Huth <thuth@xxxxxxxxxx> > > > > Nice, thanks for fixing this up. > > > > I see what you mean about multi-migration not waiting. It seems > > to be an arm issue, ppc works properly. > > Yes, it's an arm issue. s390x also works fine. > > > This patch changed things > > so it works a bit better (or at least differently) now, but > > still has some bugs. Maybe buggy uart migration? > > I'm also seeing hangs when running the arm migration-test multiple times, > but also without my UART patch here - so I assume the problem is not really > related to the UART? Yeah, I ended up figuring it out. A 11 year old TCG migration memory corruption bug! https://lists.gnu.org/archive/html/qemu-devel/2024-02/msg03486.html All the weirdness was just symptoms of that. The hang that arm usually got was target machine trying to lock the uart spinlock that is already locked (because the unlock store got lost in migration). powerpc and s390x were just luckier in avoiding the race, maybe the way their translation blocks around getchar code were constructed made the problem not show up easily or at all. I did end up causing problems for them by rearranging the code (test case is linked in that msg). Thanks, Nick