Hi everyone, I'm in the process of trying to port a debugging tool (http://rr-project.org/) from x86 to various other architectures. This tool relies on noting every change that was made to the memory of the process being debugged. As such, it has a battery of tests for corner cases of copyin/out and it is one of these that I saw behaving strangely when ported to non-x86 architectures. This particular test was testing the behavior of process_vm_readv (and writev, but for simplicity, let's assume readv here) with short local buffers. On x86 if the buffer is short and the following page is unmapped, the syscall will fill the remainder of the page, and then return however many bytes it actually wrote. However, on other architectures (I mostly looked at arm64, though the same applies elsewhere), the behavior can be quite different. In general, the behavior depends strongly on factors like how close to the start of the copy region the page break occurs, how many bytes were supposed to be left after the page break and the total size of the region to be copied. In various situations, I'm seeing: - Writes that end many bytes before the page break - Bytes being modified beyond what the syscall result would indicate happened. - Combinations thereof I can work around this in my port, but I thought it might be valuable to ask where the line is between "architecture-defined behavior" and a bug that should be reported to the appropriate architecture maintainers and eventually fixed. For example, I think it would be nice if the syscall result actually did match the actual number of bytes written in all cases. I've written a small program [1] that sets up this situation for various parameter values and prints the results. I have access to arm64, powerpc and x86, so I included results for those architectures, but I suspect other architectures have similar issues. The program should be easy to run to get your own results for a different architecture. [1] https://gist.github.com/Keno/b247bca85219c4e3bdde9f7d7ff36c77 Thanks, Keno