Re: btrfs system slow down with 100GB file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 25, 2021 at 6:39 AM Richard Shaw <hobbes1069@xxxxxxxxx> wrote:
>
> On Wed, Mar 24, 2021 at 11:05 PM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:

>> Append writes are the same on overwriting and cow file systems. You
>> might get slightly higher iowait because datacow means datasum which
>> means more metadata to write. But that's it. There's no data to COW if
>> it's just appending to a file. And metadata writes are always COW.
>
>
> Hmm... While still annoying (chrome locking up because it can't read/write to it's cache in my /home) my desk chair benchmarking says that it was definitely better as nodatacow. Now that I think about it, initial syncing I'm likely getting the blocks out of order which would explain things a bit more. I'm not too worried about nodatasum for this file as the nature of the blockchain is to be able to detect errors (intentional or accidental) already and should be self correcting.

Is this information in a database? What kind? There are cow friendly
databases (e.g. rocksdb, sqlite with WAL enabled), and comparatively
cow unfriendly ones - so it may be that setting the file to nodatacow
helps. If there's also multiple sources of frequent syncing, this can
also exacerbate things.

You can attach strace to both processes and see if either or both of
them are doing sync() of any kind, and what interval. I'm not certain
whether bcc-tools biosnoop shows all kinds of sync. It'd probably be
useful to know both what ioctl is being used by the two programs
(chrome and whatever is writing to the large file), as well as their
concurrent effect on bios using biosnoop.

>
>> You could install bcc-tools and run btrfsslower with the same
>> (exclusive) workload with datacow and nodatacow to see if latency is
>> meaningfully higher with datacow but I don't expect that this is a
>> factor.
>
>
> That's an interesting tool. So I don't want to post all of it here as it could have some private info in it but I'd be willing to share it privately.

TIME(s)     COMM           PID    DISK    T SECTOR     BYTES  LAT(ms)

There's no content displaced in any case.

>
> One interesting output now is the blockchain file is almost constantly getting written to but since it's synced, it's only getting appended to (my guess) and I'm not noticing any "chair benchmark" issues but one of the writes did take 1.8s while most of them were a few hundred ms or less.

1.8s is quite a lot of latency. It could be the result of a flush
delay due to a lot of dirty data, and while that flush is happening
it's not going to be easily or quickly preempted by some other process
demanding its data be written right now. Btrfs is quite adept at
taking multiple write streams from many processes and merging the
writes into sequential writes. Even when the writes are random, Btrfs
tends to make them sequential. This is thwarted by sync() which is a
demand to write a specific file's outstanding data and metadata right
now. It sets up all kinds of seek behavior as the data must be
written, then the metadata, then the super block.


>
> I'm pretty sure that's exactly what's happening. But is there a better I/O scheduler for traditional hard disks, currently I have:
>
> $ cat /sys/block/sda/queue/scheduler
> mq-deadline kyber [bfq] none

I don't know anything about the workload still so I'm only able to
speculate. Bfq is biased toward reads and is targeted at the desktop
use case. mq-deadline is biased toward writes and is targeted at
server use case. This is perhaps more server-like in that the chrome
writes, like firefox, are sqlite databases. Firefox enables WAL, but I
don't see that Chrome does (not sure).

You could try mq-deadline.


> $ ls -sh data.mdb
> 101G data.mdb
>
> A large bittorrent download should also be similar since you don't get the parts in order, but perhaps it's smart enough to allocate all the space on the front end?

That's up to the application that owns the file and is writing to it.
There's going to be a seek hit no matter what because they're both
written and read out of order. And while they might be database files,
they aren't active databases - different write pattern.



-- 
Chris Murphy
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux