On Sat, Jul 4, 2020 at 3:29 PM Scott Schmit <i.grok@xxxxxxxxxxx> wrote: > > On Fri, Jul 03, 2020 at 10:37:43AM -0600, Chris Murphy wrote: > > On Thu, Jul 2, 2020 at 10:29 PM Scott Schmit <i.grok@xxxxxxxxxxx> wrote: > > > > > > On Sun, Jun 28, 2020 at 03:40:11PM -0600, Chris Murphy wrote: > > > > Databases and VM images are things btrfs is bad at out of the box. > > > > Most of this has to do with fsync dependency of other file systems. > > > > Btrfs is equipped to deal with an fsync heavy world out of the box, > > > > using treelog enabled by default. But can still be slow for some > > > > workloads. > > > > > > Does this also impact mariadb databases? I've noticed that since > > > reinstalling my machine with mediawiki installed, the performance of the > > > wiki has dropped noticeably when the cache is cold (just loading the > > > pages, not editing them). > > > > Good question. A complete answer leads to a lot more questions. > > > > Mariadb has a couple older docs on this: one suggests using 'noatime' > > mount option on all file systems [1] as an optimization, and > > additionally for Btrfs to use 'nodatacow' [2]. It can be set per > > directory or per file using 'chattr +C' before files are created - it > > won't work after the fact. 'chattr +C' will make files behave like > > it's on any other filesystem: all writes are overwrites instead of > > copy-on-write, no checksums, no compression. > > > > Is this stale information? Is there something unrelated going on in > > your case? Should databases setup these optimizations on behalf of > > users? Does storage type make a difference? I'm just going to set > > those aside for now. > > FWIW, neither /var/lib/mysql nor any of the files under it were set up > with +C. That's expected. There is precedent to optimize automatically, e.g. systemd-journald sets chattr +C on /var/log/journal when it detects its Btrfs. Rabbit hole sidebar: It's an open question if this is really needed on SSD. The latency hit on HDD makes this optimization more useful. Also, when rotating the journals, systemd submits the journal for defragmentation by Btrfs. So we get some extra writes on SSDs because of this, and since it's nodatacow, it can't be compressed. So lately I 'touch /etc/tmpfiles.d/journal-nocow.conf' to prevent journald from setting /var/log/journal to nodatacow. The journals are sometimes compressed as much as 10:1 using the *lowest* zstd compression level. Is there much optimization possible here? They aren't of significant size or number. I don't think it really matters. But I think it's useful to look into these issues for databases because COW is relevant. Btrfs has it by default. But it also happens with reflink copies on XFS. And following snapshots on dm-thin. > I'm not using noatime, but I am using relatime. This isn't a terribly > large wiki -- just a personal setup (about 19M if I'm measuring the > right files). It's also more of a server use case than a workstation > one. That said, I'd imagine it's on the order of what a developer might > set up for testing. Yeah I should have asked. I kinda doubt a database of this size would improve performance in a meaningful way between cow and nowcow. Your could try letting it age for say, a month, and then go back to datacow and let it age a month - and end the end of each month compare with 'filefrag' command. COW itself isn't the cause of overhead, even SSDs are using COW internally. But there is a fragmentation factor and the extents have a cumulative tracking cost, cpu and memory wise. But even extreme fragmentation of 19M is - I don't know for sure without testing it but I wouldn't be surprised if it had no or very low cost. > > > To give the nodatacow suggestion a try: > > ## shutdown the database > > # mkdir /var/lib/mysql2 > > # chattr +C /var/lib/mysql2 > > # cp /var/lib/mysql/* /var/lib/mysql2/ > > # rm /var/lib/mysql/ > > # mv /var/lib/mysql2/ /var/lib/mysql/ > > ## resume operation > > Doing the manipulations to make it nocow doesn't appear to have made a > significant difference: I still see a delay between the raw page (sans > CSS) loading and the CSS loading to make it look right. I thought it > had lessened when I tried it last night, but when I tried again today, > it was back just as long. When I was running on LVM+ext4, I remember no > delay. Maybe the database has nothing to do with it? Maybe. But then where is it coming from? Another rabbit hole is performance troubleshooting. bcc-tools has file system tracing tools for this purpose, but I haven't dug into any of that. > Incidently... how does one handle chattr +C as part of tar backups and > the like? My expectation is that as it's a local optimization, it gets set when it's copied (created) locally by inheriting +C from a directory. I'm not sure if there's a way to store/restore file attributes with tar. So what or who should set it? The distribution could do it at install time, use an anaconda post-install script to set it on target directories, or bake it into the rpm file. What about directories that don't yet exist? Is it an application responsibility? These are the questions. I kinda like the systemd approach. If the recommendation changes, a future update can cause it to be unset. This has its own drawback, as you can see from my earlier command above that I inhibit the setting of +C on journals now. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx