Hi all! Some days ago I posted to #virtiofs:matrix.org, describing that I am observing what looks like a corruption when executing programs from a virtiofs-based filesystem. Over the last few days I spent more time drilling into the problem. This is an attempt at summarizing my findings in order to see what other people think about this. When running binaries mounted from virtiofs they may either: fail with a segfault, fail with badaddr, get stuck or - sometimes - succeed. Environment: Host: Fedora 38 running 6.4.11-200.fc38.x86_64 Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64 virtiofsd: latest main + some debug prints [1] QEMU: built from recent git [2] virtiofsd invocation: RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \ --socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=never QEMU invocation: ~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \ -cpu cortex-a57 \ -serial mon:stdio \ -device virtio-net-pci,netdev=net0 \ -netdev user,id=net0,hostfwd=tcp::2223-:22 \ -display none -m 2048 -smp 4 \ -object memory-backend-memfd,id=mem,size=2048M,share=on \ -numa node,memdev=mem \ -hda trs-overlay-guest.qcow2 \ -chardev socket,id=char0,path="fs.sock0" \ -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \ -append 'root=/dev/vda2 ro log_buf_len=8M' I figured that launching virtiofsd with --cache=always masks the problem. Therefore, I set --cache=never, but I think I observed no difference compared to the default setting (auto). Adding logging to virtiofsd and kernel _feeled_ like it made the problem harder to reproduce - leaving me with the impression that some race is happening on somewhere. Trying to rule out that virtiofsd is returning corrupted data, I added some logging and hashsum calculation hacks to it [1]. The hashes check out across multiple accesses and the order and kind of queued messages is exactly the same in both the error case and crash case. fio was also unable to find any errors with a naive job description [3]. Next, I tried to capture info on the guest side. This became a bit tricky since the crashes became pretty rare once I followed a fixed pattern of starting log capture, running perf and trying to reproduce the problem. Ultimately, I had the most consistent results with immediately running a program twice: /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \ /mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils / (/mnt being the virtiofs mount) For collecting logs, I made a hack to the guest kernel in order to dump the page content after receiving the virtiofs responses [4]. Reproducing the problem with this, leaves me with logs that seem to suggest that virtiofsd is returning identical content, but the guest kernel seems to receive differing pages: good-kernel [5]: kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid 0x1 in.len 56 out.len 104 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid 0x1 in.len 53 out.len 128 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid 0x29 in.len 56 out.len 104 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid 0x29 in.len 48 out.len 16 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid 0x29 in.len 80 out.len 832 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs: page: 000000006996d520 kernel: virtio_fs: to: 00000000de590c14 kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [...] bad-kernel [6]: kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid 0x1 in.len 56 out.len 104 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid 0x1 in.len 53 out.len 128 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid 0x16 in.len 56 out.len 104 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid 0x16 in.len 48 out.len 16 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid 0x16 in.len 80 out.len 832 kernel: virtiofs virtio1: virtio_fs_vq_done requests.0 kernel: virtio_fs: page: 000000006ce9a559 kernel: virtio_fs: to: 000000007ae8b946 kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00 .@.......+b..... kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 .N.............. [...] When looking at the corresponding output from virtiofsd, it claims to have returned identical data: good-virtiofsd [7]: [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41, unique=794, pid=481 [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn { fh: 31, offset: 0, size: 832, read_flags: 2, lock_owner: 6838554705639967244, flags: 131072, padding: 0, } [src/file_traits.rs:161] hash = 2308490450751364994 [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 794 } bad-virtiofsd [8]: [DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22, unique=362, pid=406 [src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn { fh: 12, offset: 0, size: 832, read_flags: 2, lock_owner: 6181120926258395554, flags: 131072, padding: 0, } [src/file_traits.rs:161] hash = 2308490450751364994 [DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 362 } The "corruption" only seems to happen in this one page, all other pages are identical between runs (except that the bad run terminates earlier). What do the experts think here? To me it feels a bit like some kind of corruption is going on. Or am I misinterpreting things here? Which further analysis steps would you suggest? Further notes: After collecting the above results, I realized that running the guest with -smp 1 makes the problems a lot worse. So maybe that is a better choice when trying to reproduce it. Repo with my scripts is available at: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/ The scripts are just quick and dirty implementations and are not particulary portable. - Erik [1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f [2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4 [3] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio [4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95 [5] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt [6] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt [7] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549 [8] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063