Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 27 Aug 2020 14:35:07 -0400

On Thu, Aug 27, 2020 at 06:02:42PM +0100, Christoph Hellwig wrote:
> On Thu, Aug 27, 2020 at 11:57:03AM -0400, Josef Bacik wrote:
> > This sort of brings up a good point, the whole point of DISCARD support in
> > log-writes was to expose problems where we may have been discarding real
> > data we cared about, hence adding the forced zero'ing stuff for devices that
> > didn't support discard.  But that made the incorrect assumption that a drive
> > with actual discard support would actually return 0's for discarded data.
> > That assumption was based on hardware that did actually do that, but now we
> > live in the brave new world of significantly shittier drives.  Does dm-thinp
> > reliably unmap the ranges we discard, and thus give us this zero'ing
> > behavior?  Because we might as well just use that for everything so
> > log-writes doesn't have to resort to pwrite()'ing zeros everywhere.  Thanks,
> 

That's pretty much what this series does. It only modifies the generic
tests because I didn't want to mess with the others (all btrfs, I think)
that might not have any issues, but I wouldn't be opposed to burying the
logic into the dmlogwrites bits so it just always creates a thin volume
behind the scenes. If we were going to do that, I'd prefer to do it as a
follow up to these patches (dropping patch 1, most likely) so at least
they can remain enabled on XFS for the time being.

OTOH, perhaps the thinp behavior could be internal, but conditional
based on XFS. It's not really clear to me if this problem is more of an
XFS phenomenon or just that XFS happens to have some unique recovery
checking logic that explicitly detects it. It seems more like the
latter, but I don't know enough about ext4 or btrfs to say..

> We have a write zeroes operation in the block layer.  For some devices
> this is as efficient as discard, and that should (I think) dm.
> 

Do you mean BLKZEROOUT? I see that is more efficient than writing zeroes
from userspace, but I don't think it's efficient enough to solve this
problem. It takes about 3m to manually zero my 15GB lvm (dm-linear)
scratch device on my test vm via dd using sync writes. A 'blkdiscard -z'
saves me about half that time, but IIRC this is an operation that would
occur every time the logwrites device is replayed to a particular
recovery point (which can happen many times per test).

Brian