Re: btrfs system slow down with 100GB file

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Wed, 24 Mar 2021 22:04:50 -0600

On Wed, Mar 24, 2021 at 6:09 AM Richard Shaw <hobbes1069@xxxxxxxxx> wrote:
>
> I was syncing a 100GB blockchain, which means it was frequently getting appended to, so COW was really killing my I/O (iowait > 50%) but I had hoped that marking as nodatacow would be a 100% fix, however iowait would be quite low but jump up on a regular basis to 25%-50% occasionally locking up the GUI briefly. It was worst when the blockchain was syncing and I was rm the old COW version even after rm returned. I assume there was quite a bit of background tasks that were still updating.

> I assume for a blockchain, starts small and just grows / appended to.

Append writes are the same on overwriting and cow file systems. You
might get slightly higher iowait because datacow means datasum which
means more metadata to write. But that's it. There's no data to COW if
it's just appending to a file. And metadata writes are always COW.

You could install bcc-tools and run btrfsslower with the same
(exclusive) workload with datacow and nodatacow to see if latency is
meaningfully higher with datacow but I don't expect that this is a
factor.

iowait just means the CPU is idle waiting for IO to complete. It could
do other things, even IO, if that IO can be preempted by proper
scheduling. So the GUI freezes are probably because there's some other
file on /home, along with this 100G file, that needs to be accessed
and between the kernel scheduler, the file system, the IO scheduler,
and the drive, it's just reluctant to go do that IO. Again, bcc-tools
can help here in the form of fileslower, which will show latency
spikes regardless of the file system (it's at the VFS layer and thus
closer to the application layer which is where the GUI stalls will
happen).

Any way this workload can be described in sufficient detail that
anyone can reproduce the setup, can help make it possible for multiple
other people trying to collect the information we'd need to track down
what's going on. And that also includes A/B testing, such as the exact
same setup but merely running the 100G (presumably it is not actually
the exact size but the workload as the sync is happening)

Also the more we can take this from the specific case to the general
case, including using generic tools like xfs_io instead of a
blockchain program, the more attention we can give it because people
don't have to learn app specific things. And we can apply the fix to
all similar workloads.

>> >
>> > On a tangent, it took about 30 minutes to delete the old file... My system is a Ryzen 5 3600 w/ 16GB or memory but it is a spinning disk. I use an NVME for the system and the spinning disk for /home.
>>
>> filefrag 100G.file
>> What's the path to the file?
>
>
> $ filefrag /home/richard/.bitmonero/lmdb/data.mdb
> /home/richard/.bitmonero/lmdb/data.mdb: 1424 extents found

Just today I deleted a 100G Windows 10 raw file with over 6000 extents
and it deleted in 3 seconds. So I'm not sure why the delay in your
case. So more information is needed, I'm not sure what to use in this
case, maybe btrfsslower while also stracing the rm. There is only one
ioctl, unlinkat(), and it does need to exit before rm will return to a
prompt. But unlinkat() does not imply sync, so it's not necessary for
btrfs to write the metadata change unless something else has issued
fsync on the enclosing directory, maybe. In that case the command
would hang until all the dirty metadata as a result of the delete is
updated. And btrfsslower will show this.

> However, I let a rebalance run overnight.

It shouldn't be necessary to run balance. If you've hit ENOSPC, it's a
bug and needs to be reported. And a separate thread can be started on
balance if folks want more info on balance, maintenance, ENOSPC
things. I don't ever worry about them anymore. Not since ticketed
ENOSPC infrastructure landed circa 2016 in kernel ~4.8.

-- 
Chris Murphy
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure