Re: [Bug 199931] New: systemd/rtorrent file data corruption when using echo 3 >/proc/sys/vm/drop_caches

james harvey <jamespharvey20@xxxxxxxxx> · Wed, 6 Jun 2018 16:33:23 -0400

On Wed, Jun 6, 2018 at 3:06 PM, Marc Lehmann <schmorp@xxxxxxxxxx> wrote:
> On Tue, Jun 05, 2018 at 05:52:38PM -0400, james harvey <jamespharvey20@xxxxxxxxx> wrote:
>> >> This is not always reproducible, but when deleting our journal, creating log
>> >> messages for a few hours and then doing the above manually has a ~50% chance of
>> >> corrupting the journal.
>> ...
>>
>> My strong bet is you have a hardware issue.
>
> Strange, what kind of harwdare bug would affect multiple very different
> computers in exactly the same way?

Oops.  I missed when you clearly said: "All of this is reproducible on
two different boxes, so is unlikely to be a hardware issue."  I ran
into all these problems ultimately because of a badly designed Marvell
SATA controller.  I thought I had ruled out hardware issues by having
2 identical systems, and reproducing the problem on both.  Certainly
makes a hardware issue for you much less likely, especially if "very
different computers" means different motherboards.

FWIW, I have dropped caches a lot lately (not nearly as much as your
crons) and haven't had it corrupt anything, even in proximity to heavy
I/O.

>> going bad, bad cables, bad port, etc.  My strong bet is you're also
>> using BTRFS mirroring.
>
> Not sure what exactly you mean with btrfs mirroring (there are many btrfs
> features this could refer to), but the closest thing to that that I use is
> dup for metadata (which is always checksummed), data is always single. All
> btrfs filesystems are on lvm (not mirrored), and most (but not all) are
> encrypted. One affected fs is on a hardware raid controller, one is on an
> ssd. I have a single btrfs fs in that box with raid1 for metadata, as an
> experiment, but I haven't used it for testing yet.

Was referring to any type of data mirroring.  Data dup, btrfs
RAID1/5/6/10.  But, I see that's not the case here.

>> You're describing intermittent data corruption on files that I'm
>> thinking all have NOCOW turned on.
>
> The systemd journal files are nocow (I re-enabled that after I turned it
> off for a while), but the rtorrent directory (and the files in it) are
> not.
>
> I did experiment (a year ago) with nocow for torrent files and, more
> importantly, vm images, but it didn't really solve the "millions of
> fragments slow down" problem with btrfs, so I figured I can keep them cow
> and regularly copy them to defragment them. Thats why I am quite sure cow
> is switched on long before I booted my first 4.14 kernel (and it still
> is).

Yeah, with data single, you wouldn't be seeing intermittent problems
if it was related to the bugs I was talking about.

>> it's done writing to a journal file, but in a way that guarantees it
>> to fail.  This has been reported to systemd at
>> https://github.com/systemd/systemd/issues/9112 but poettering has
>
> I am aware that systemd tries to turn on nocow, and I think this is actually
> a bug, but this wouldn't have an an effect on rtorrent, which has corruption
> problems on a different fs. And boy would it be wonderufl if Debian switched
> away form systemd, I feel I personally ran into every single bug that
> exists...

systemd turning on NOCOW isn't a bug.  systemd 219 intentionally
turned on NOCOW for journal files, attempting to improve performance
on btrfs.  220 made it user-configurable, defaulting to turning on
NOCOW.  But, yeah, the bugs I was talking about wouldn't affect
rtorrent files on a different fs, since you have NOCOW off on them,
and since they're data single.

> However, no matter how much systemd plays with btrfs flags, it shouldn't
> corrupt data.

Yeah, it doesn't in itself.  Just makes them susceptible to one disk
corruption that btrfs would otherwise protect against with data
checksums.  And, if using compression and btrfs replace on current
kernels, guarantees them to be corrupted.