On Wed, Mar 24, 2021 at 6:09 AM Richard Shaw <hobbes1069@xxxxxxxxx> wrote: > > I was syncing a 100GB blockchain, which means it was frequently getting appended to, so COW was really killing my I/O (iowait > 50%) but I had hoped that marking as nodatacow would be a 100% fix, however iowait would be quite low but jump up on a regular basis to 25%-50% occasionally locking up the GUI briefly. It was worst when the blockchain was syncing and I was rm the old COW version even after rm returned. I assume there was quite a bit of background tasks that were still updating. > I assume for a blockchain, starts small and just grows / appended to. Append writes are the same on overwriting and cow file systems. You might get slightly higher iowait because datacow means datasum which means more metadata to write. But that's it. There's no data to COW if it's just appending to a file. And metadata writes are always COW. You could install bcc-tools and run btrfsslower with the same (exclusive) workload with datacow and nodatacow to see if latency is meaningfully higher with datacow but I don't expect that this is a factor. iowait just means the CPU is idle waiting for IO to complete. It could do other things, even IO, if that IO can be preempted by proper scheduling. So the GUI freezes are probably because there's some other file on /home, along with this 100G file, that needs to be accessed and between the kernel scheduler, the file system, the IO scheduler, and the drive, it's just reluctant to go do that IO. Again, bcc-tools can help here in the form of fileslower, which will show latency spikes regardless of the file system (it's at the VFS layer and thus closer to the application layer which is where the GUI stalls will happen). Any way this workload can be described in sufficient detail that anyone can reproduce the setup, can help make it possible for multiple other people trying to collect the information we'd need to track down what's going on. And that also includes A/B testing, such as the exact same setup but merely running the 100G (presumably it is not actually the exact size but the workload as the sync is happening) Also the more we can take this from the specific case to the general case, including using generic tools like xfs_io instead of a blockchain program, the more attention we can give it because people don't have to learn app specific things. And we can apply the fix to all similar workloads. >> > >> > On a tangent, it took about 30 minutes to delete the old file... My system is a Ryzen 5 3600 w/ 16GB or memory but it is a spinning disk. I use an NVME for the system and the spinning disk for /home. >> >> filefrag 100G.file >> What's the path to the file? > > > $ filefrag /home/richard/.bitmonero/lmdb/data.mdb > /home/richard/.bitmonero/lmdb/data.mdb: 1424 extents found Just today I deleted a 100G Windows 10 raw file with over 6000 extents and it deleted in 3 seconds. So I'm not sure why the delay in your case. So more information is needed, I'm not sure what to use in this case, maybe btrfsslower while also stracing the rm. There is only one ioctl, unlinkat(), and it does need to exit before rm will return to a prompt. But unlinkat() does not imply sync, so it's not necessary for btrfs to write the metadata change unless something else has issued fsync on the enclosing directory, maybe. In that case the command would hang until all the dirty metadata as a result of the delete is updated. And btrfsslower will show this. > However, I let a rebalance run overnight. It shouldn't be necessary to run balance. If you've hit ENOSPC, it's a bug and needs to be reported. And a separate thread can be started on balance if folks want more info on balance, maintenance, ENOSPC things. I don't ever worry about them anymore. Not since ticketed ENOSPC infrastructure landed circa 2016 in kernel ~4.8. -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure