Understanding of --integrity* parameters for LUKS-independent dm-integrity setup

asmqb7 <asmqb7@xxxxxxxxx> · Sun, 24 May 2020 21:45:56 +1000

Hi,

I'm looking to identify efficient and correct reference parameters for
integritysetup that will reasonably defend against hardware failure,
generally considering scenarios that won't necessarily benefit from
the affordances (and overheads) of FDE.

Approaching the dm-integrity/integritysetup documentation with a
(presently) poor/untrustworthy fundamental understanding of
cryptography, I'm not currently able to conclusively evaluate the
significance and semantic impact of the choices available for the
--integrity, --integrity-key-file and --integrity-key-size parameters.
Instead of copy-pasting random commands I've found on the Internet
that just happen to work, I don't want to proceed until I
comprehensively (if not fundamentally) understand the options
available and what scenarios each might be appropriate for.

A first-principle comparison of all available crypto algorithms would
of course be beyond the scope of the integritysetup documentation, but
because of cryptography's highly context- and use-case-specific
nature, I still think that a storage-integrity-focused overview/guide,
even if slightly biased, would be extremely useful. Cohesive and
authoritative reference documentation may also effectively mitigate
superstitious, catastrophically underinformed bikeshedding, and for
this reason my queries err toward a pedantic level of detail.

*** HMAC-SHA* key configuration

- Tracing from integritysetup.c through dm-integrity.c to
hmac_setkey() in /crypto/hmac.c in the kernel reveals that the
keyfiles used by dm-integrity appear to be used as initialization
seeds for the HMAC function. My current working theory/assumption for
why the the user is allowed/required to supply this data is that the
uniqueness and secrecy associated with a user-specifiable HMAC seed
holds defensive cryptographic value. Is this view correct, and are
there any non-secrecy-related reasons I might want to specify my own
HMAC seed value?

- What pathological issues might arise from providing HMAC seed values
of all-0x00, all-0xFF, 0x00..0x63, a repeating pattern, etc? Could
such highly-deterministic, low/zero-entropy keyfiles be considered
universally sane defaults, including for the purposes of automated
system installation, in scenarios where good data integrity is
strictly the only consideration? (I would be particularly partial to
all-0x00, because then I could do --integrity-key-file=/dev/zero.)

- Apparently correct key lengths are 32 for SHA256 (256/8) and 64 for
SHA512 (512/8). Do I understand correctly that longer keys will be
truncated, while shorter keys will be zero-padded (specifically
-suffixed)? (Newly-formatted, untouched volumes using HMAC-SHA512 with
each of 1-, 32-, 64- and 2048-byte all-'\0' keys all present the same
checksum, suggesting this is true.)

*** Integrity algorithm selection

I was curious if the "crc32c" and "hmac-sha256" options noted in the
manpage (as of May 2020) represented the full list of algorithms
accepted by the integritysetup --integrity parameter, so after loading
all modules in /lib/modules/.../crypto/ (using the stock 4.19.0 kernel
in Debian 10.4), I iterated over all entries presented in
/proc/crypto. In my case, the following algorithms resulted in
successful volume creation (omitting --integrity-key-{file,size}):
tgr128, tgr160, tgr192, wp256, wp384, wp512, rmd128, rmd160, rmd256,
rmd320, poly1305, md4, md5, sha1, sha224, sha384, sha256, sha512,
crct10dif, crc32, and crc32c.

In the interests of disambiguation and thoroughness I offer a
**indeterminate/unverified** commentary of the algorithms my kernel
currently offers. Any critique/correction (NAK) where there is wrong
understanding, and agreement/consensus (ACK) where there is soundness,
will equally be appreciated.

- CRC32 (which is weaker than CRC32C) and "crct10dif" (T.10 DIF CRC16,
basically half of CRC32) are probably universally unsuitable;
- CRC32C may be useful where a low/rudimentary level of protection is
wanted in low-power/embedded contexts (where acceleration may be
unavailable), and maybe even for many general-purpose COTS/mainstream
setups;
- SHA-1 (vulnerable to both the SHAttered attack from 2017, and the
more recent chosen-prefix SHA-mbles attack from 2019), MD5
(widely-known to be vulnerable to collision attacks), and MD4
(significantly compromised since 1995) all suffer from significant
well-known vulnerabilities and attacks and ARE NOT cryptographically
sound, but may offer a stronger level of defense than CRC32C, with
SHA-1 a relative order of magnitude stronger than MD5 and MD4;
- "wp"/Whirlpool (as used in the original TrueCrypt), "rmd"/RIPEMD
(specifically the strengthened versions released in 1996), and
"tgr"/Tiger (which OpenPGP abandoned for RIPEMD-160) are (grossly)
variously cryptographically controversial ("uncertain") but not
concretely broken, may offer a stronger level of protection than
SHA-1, MD5, et al for the purposes of integrity preservation, and may
become subject to future catastrophic attacks that break their current
cryptographic guarantees;
- SHA-224 (the truncation of SHA-256) and SHA-384 (the truncation of
SHA-512) may offer adequate collision resistance despite limited
entropy in (admittedly hard to imagine) scenarios where there is
insufficient space for 32- or 64-byte hashes;
- HMAC-SHA1 may be more acceptable than it first appears, thanks to
the HMAC construction's resistance to the length attacks plain SHA* is
susceptible to (see next point), and also because HMAC does not demand
strong security guarantees
(https://security.stackexchange.com/questions/187866/why-aren-t-collisions-important-with-hmac,
although a comment mentions HMAC-MD4 is compromised);
- Plain SHA-256 and SHA-512 are probably acceptable for data integrity
protection, because the fixed-length interleaved blocks used by
dm-integrity renders it immune to the length attacks SHA* is
vulnerable to (https://security.stackexchange.com/questions/79577/whats-the-difference-between-hmac-sha256key-data-and-sha256key-data);
- Poly1305 (as used for message authentication in TLS 1.3),
HMAC-SHA256 and HMAC-SHA512 are probably good sane defaults.

I also have some further general questions in addition to the above
purely cryptographically-focused analysis.

- How does Poly1305 generally compare with HMAC-SHA{1,256,512} within
an integrity-preserving checksumming context? Where/why might I pick
one over the other?
- In what real-world scenarios might I accept CRC32C? I've read a lot
of anecdata about its (poor) collision resistance, mostly consisting
of ambiguous (to me), handwavy/inconclusive field reports and
discussion like https://news.ycombinator.com/item?id=13853110 (root
article link: https://news.ycombinator.com/item?id=13851349)
- In what scenarios might I find plain SHA-* acceptable over HMAC-SHA*?
- Where might it be acceptable to choose HMAC-SHA1 over HMAC-SHA256 or
HMAC-SHA512?
- Am I correct in theorizing(/presuming) that dm-integrity's
fixed-length block structure does indeed make SHA-256 and SHA-512 safe
to use?
- What inputs exactly are provided to the hash functions? I found an
interesting HMAC-SHA512 vs Poly1305 comparison at
https://crypto.stackexchange.com/questions/56429/which-algorithm-has-better-performance-hmac-umac-and-poly1305,
which incidentally highlights the criticality of supplying correctly
unique inputs to Poly1305 to ensure secure output. I have no reason to
believe this detail was not well known during dm-integrity's design,
and that it was handled correctly; I just can't find any explicit
references in Authenticated and Resilient Disk Encryption (final.pdf).
(That paper does document in Table 4.5 on page 39 that "random"
integrity IVs are used for AEAD modes, but there is no specific
connection made to Poly1305, nor to HMAC-SHA* for that matter.)
- Very tangentially, could it be reasonable to propose "hmac-crc32c"
for data integrity protection? I found
https://news.ycombinator.com/item?id=16750767 which vaguely implies
such a construction would be silly, but the comment doesn't clarify
why. I also found
https://www.spinics.net/lists/linux-crypto/msg25086.html which notes
the removal of an apparent mistaken hmac-crc32 capability. I get the
impression this is an interesting but wrongheaded idea, but am yet
without explanation.
- Do any modes other than HMAC-SHA* require a keyfile? (I tried all
entries in /proc/crypto with a keyfile, and all failed except
"digest_null" (heh).)

*** Miscellaneous findings

integritysetup and dm-integrity currently report invalid algorithm
selections in a somewhat inscrutable way: integritysetup --debug
reports "device-mapper: reload ioctl on   failed: No such file or
directory" (with two spaces between "on" and "failed", as presented),
while dm-integrity squirrels "device-mapper: table: 254:0: integrity:
Invalid internal hash" or "device-mapper: table: 254:0: integrity:
Error setting internal hash key" into dmesg. I first ran into this
problem when experimenting with different integrity algorithms and
trying "aead" and "cmac-aes" after finding references to those
algorithms in integritysetup.c (I now realize these modes only make
sense for dm-crypt). I later found when checking all algorithm
possibilities listed in /proc/crypto that this was how invalid
algorithms were reported.

I wonder how "integrity-only" and "integrity-with-encryption" might be
distinguished in the code, so integritysetup can properly bail out if
asked to use an irrelevant mode. FWIW, my first instinct on seeing
that reference to a zero-length filename was to momentarily fear
something had been deleted upon cleanup, until I remembered I really
wasn't dealing with a flaky bash script :)

As a small aside, the fact that AEAD is not usable for integrity-only
contexts makes logical sense; "Authenticated and Resilient Disk
Encryption" (final.pdf) describes how AEAD implements both encryption
and integrity protection (I am yet to understand how it does so in a
length-preserving manner :) ) so obviously trying to use that
algorithm in an integrity-only context would fail. I observe that
searching the same PDF for "cmac-aes" returns 0 results, perhaps
because this mode was implemented after the paper was published.

*** References consulted before emailing, in order of descending relevance:

- https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt
- https://gitlab.com/cryptsetup/cryptsetup/-/wikis/DMIntegrity
- https://is.muni.cz/th/vesfr/final.pdf (this was very interesting to read)
- https://archive.fosdem.org/2018/schedule/event/cryptsetup/attachments/slides/2506/export/events/attachments/cryptsetup/slides/2506/fosdem18_cryptsetup_aead.pdf
- https://crypto.stackexchange.com/questions/56429/which-algorithm-has-better-performance-hmac-umac-and-poly1305
- https://securitypitfalls.wordpress.com/2018/05/08/raid-doesnt-work/
- https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb
- https://github.com/torvalds/linux/blob/master/drivers/md/dm-integrity.c
- https://wiki.gentoo.org/wiki/Device-mapper#Integrity (this is
currently terribly out of date)
- https://dm-devel.redhat.narkive.com/3zjEiVPz/dmitry-kasatkin-huawei-com
- https://security.stackexchange.com/questions/190670/luks2-dm-integrity
- https://en.wikipedia.org/wiki/HMAC (this is very inscrutable and
doesn't enlighten much...)
- https://security.stackexchange.com/questions/79577/whats-the-difference-between-hmac-sha256key-data-and-sha256key-data/79581
- https://security.stackexchange.com/questions/135936/finding-hash-collision
- https://github.com/torvalds/linux/blob/master/crypto/hmac.c
- https://github.com/mbroz/cryptsetup/blob/master/src/integritysetup.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/integrity/integrity.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/utils.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/utils_crypt.c
- https://github.com/mbroz/cryptsetup/blob/master/src/utils_tools.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/setup.c
- https://github.com/mbroz/cryptsetup/blob/master/lib/libcryptsetup.h
- https://github.com/torvalds/linux/blob/master/drivers/md/dm-crypt.c

***

Thanks very much for implementing the dm-integrity target, and for
introducing novel, universal silent data corruption detection to
Linux's block layer in a pluggable way and making it usable
independently of LUKS; it allows virtually all current and future
Linux filesystems, dm-raid and LVM configurations, and software that
works with block devices, to be informed of offline tampering and
hardware failure by simply checking for -EILSEQ. I look forward to the
day major distribution installers offer zero-effort opt-in to
integrity protection and enable mainstream, enterprise and embedded
Linux users everywhere to regularly contribute significantly to the
"storage media reports healthy but just returned wrong data"
statistics. Ideally, real and significant progress will be made in
this area within the next 10 years.

David Lindsay
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
https://www.saout.de/mailman/listinfo/dm-crypt