Hi, I'm looking to identify efficient and correct reference parameters for integritysetup that will reasonably defend against hardware failure, generally considering scenarios that won't necessarily benefit from the affordances (and overheads) of FDE. Approaching the dm-integrity/integritysetup documentation with a (presently) poor/untrustworthy fundamental understanding of cryptography, I'm not currently able to conclusively evaluate the significance and semantic impact of the choices available for the --integrity, --integrity-key-file and --integrity-key-size parameters. Instead of copy-pasting random commands I've found on the Internet that just happen to work, I don't want to proceed until I comprehensively (if not fundamentally) understand the options available and what scenarios each might be appropriate for. A first-principle comparison of all available crypto algorithms would of course be beyond the scope of the integritysetup documentation, but because of cryptography's highly context- and use-case-specific nature, I still think that a storage-integrity-focused overview/guide, even if slightly biased, would be extremely useful. Cohesive and authoritative reference documentation may also effectively mitigate superstitious, catastrophically underinformed bikeshedding, and for this reason my queries err toward a pedantic level of detail. *** HMAC-SHA* key configuration - Tracing from integritysetup.c through dm-integrity.c to hmac_setkey() in /crypto/hmac.c in the kernel reveals that the keyfiles used by dm-integrity appear to be used as initialization seeds for the HMAC function. My current working theory/assumption for why the the user is allowed/required to supply this data is that the uniqueness and secrecy associated with a user-specifiable HMAC seed holds defensive cryptographic value. Is this view correct, and are there any non-secrecy-related reasons I might want to specify my own HMAC seed value? - What pathological issues might arise from providing HMAC seed values of all-0x00, all-0xFF, 0x00..0x63, a repeating pattern, etc? Could such highly-deterministic, low/zero-entropy keyfiles be considered universally sane defaults, including for the purposes of automated system installation, in scenarios where good data integrity is strictly the only consideration? (I would be particularly partial to all-0x00, because then I could do --integrity-key-file=/dev/zero.) - Apparently correct key lengths are 32 for SHA256 (256/8) and 64 for SHA512 (512/8). Do I understand correctly that longer keys will be truncated, while shorter keys will be zero-padded (specifically -suffixed)? (Newly-formatted, untouched volumes using HMAC-SHA512 with each of 1-, 32-, 64- and 2048-byte all-'\0' keys all present the same checksum, suggesting this is true.) *** Integrity algorithm selection I was curious if the "crc32c" and "hmac-sha256" options noted in the manpage (as of May 2020) represented the full list of algorithms accepted by the integritysetup --integrity parameter, so after loading all modules in /lib/modules/.../crypto/ (using the stock 4.19.0 kernel in Debian 10.4), I iterated over all entries presented in /proc/crypto. In my case, the following algorithms resulted in successful volume creation (omitting --integrity-key-{file,size}): tgr128, tgr160, tgr192, wp256, wp384, wp512, rmd128, rmd160, rmd256, rmd320, poly1305, md4, md5, sha1, sha224, sha384, sha256, sha512, crct10dif, crc32, and crc32c. In the interests of disambiguation and thoroughness I offer a **indeterminate/unverified** commentary of the algorithms my kernel currently offers. Any critique/correction (NAK) where there is wrong understanding, and agreement/consensus (ACK) where there is soundness, will equally be appreciated. - CRC32 (which is weaker than CRC32C) and "crct10dif" (T.10 DIF CRC16, basically half of CRC32) are probably universally unsuitable; - CRC32C may be useful where a low/rudimentary level of protection is wanted in low-power/embedded contexts (where acceleration may be unavailable), and maybe even for many general-purpose COTS/mainstream setups; - SHA-1 (vulnerable to both the SHAttered attack from 2017, and the more recent chosen-prefix SHA-mbles attack from 2019), MD5 (widely-known to be vulnerable to collision attacks), and MD4 (significantly compromised since 1995) all suffer from significant well-known vulnerabilities and attacks and ARE NOT cryptographically sound, but may offer a stronger level of defense than CRC32C, with SHA-1 a relative order of magnitude stronger than MD5 and MD4; - "wp"/Whirlpool (as used in the original TrueCrypt), "rmd"/RIPEMD (specifically the strengthened versions released in 1996), and "tgr"/Tiger (which OpenPGP abandoned for RIPEMD-160) are (grossly) variously cryptographically controversial ("uncertain") but not concretely broken, may offer a stronger level of protection than SHA-1, MD5, et al for the purposes of integrity preservation, and may become subject to future catastrophic attacks that break their current cryptographic guarantees; - SHA-224 (the truncation of SHA-256) and SHA-384 (the truncation of SHA-512) may offer adequate collision resistance despite limited entropy in (admittedly hard to imagine) scenarios where there is insufficient space for 32- or 64-byte hashes; - HMAC-SHA1 may be more acceptable than it first appears, thanks to the HMAC construction's resistance to the length attacks plain SHA* is susceptible to (see next point), and also because HMAC does not demand strong security guarantees (https://security.stackexchange.com/questions/187866/why-aren-t-collisions-important-with-hmac, although a comment mentions HMAC-MD4 is compromised); - Plain SHA-256 and SHA-512 are probably acceptable for data integrity protection, because the fixed-length interleaved blocks used by dm-integrity renders it immune to the length attacks SHA* is vulnerable to (https://security.stackexchange.com/questions/79577/whats-the-difference-between-hmac-sha256key-data-and-sha256key-data); - Poly1305 (as used for message authentication in TLS 1.3), HMAC-SHA256 and HMAC-SHA512 are probably good sane defaults. I also have some further general questions in addition to the above purely cryptographically-focused analysis. - How does Poly1305 generally compare with HMAC-SHA{1,256,512} within an integrity-preserving checksumming context? Where/why might I pick one over the other? - In what real-world scenarios might I accept CRC32C? I've read a lot of anecdata about its (poor) collision resistance, mostly consisting of ambiguous (to me), handwavy/inconclusive field reports and discussion like https://news.ycombinator.com/item?id=13853110 (root article link: https://news.ycombinator.com/item?id=13851349) - In what scenarios might I find plain SHA-* acceptable over HMAC-SHA*? - Where might it be acceptable to choose HMAC-SHA1 over HMAC-SHA256 or HMAC-SHA512? - Am I correct in theorizing(/presuming) that dm-integrity's fixed-length block structure does indeed make SHA-256 and SHA-512 safe to use? - What inputs exactly are provided to the hash functions? I found an interesting HMAC-SHA512 vs Poly1305 comparison at https://crypto.stackexchange.com/questions/56429/which-algorithm-has-better-performance-hmac-umac-and-poly1305, which incidentally highlights the criticality of supplying correctly unique inputs to Poly1305 to ensure secure output. I have no reason to believe this detail was not well known during dm-integrity's design, and that it was handled correctly; I just can't find any explicit references in Authenticated and Resilient Disk Encryption (final.pdf). (That paper does document in Table 4.5 on page 39 that "random" integrity IVs are used for AEAD modes, but there is no specific connection made to Poly1305, nor to HMAC-SHA* for that matter.) - Very tangentially, could it be reasonable to propose "hmac-crc32c" for data integrity protection? I found https://news.ycombinator.com/item?id=16750767 which vaguely implies such a construction would be silly, but the comment doesn't clarify why. I also found https://www.spinics.net/lists/linux-crypto/msg25086.html which notes the removal of an apparent mistaken hmac-crc32 capability. I get the impression this is an interesting but wrongheaded idea, but am yet without explanation. - Do any modes other than HMAC-SHA* require a keyfile? (I tried all entries in /proc/crypto with a keyfile, and all failed except "digest_null" (heh).) *** Miscellaneous findings integritysetup and dm-integrity currently report invalid algorithm selections in a somewhat inscrutable way: integritysetup --debug reports "device-mapper: reload ioctl on failed: No such file or directory" (with two spaces between "on" and "failed", as presented), while dm-integrity squirrels "device-mapper: table: 254:0: integrity: Invalid internal hash" or "device-mapper: table: 254:0: integrity: Error setting internal hash key" into dmesg. I first ran into this problem when experimenting with different integrity algorithms and trying "aead" and "cmac-aes" after finding references to those algorithms in integritysetup.c (I now realize these modes only make sense for dm-crypt). I later found when checking all algorithm possibilities listed in /proc/crypto that this was how invalid algorithms were reported. I wonder how "integrity-only" and "integrity-with-encryption" might be distinguished in the code, so integritysetup can properly bail out if asked to use an irrelevant mode. FWIW, my first instinct on seeing that reference to a zero-length filename was to momentarily fear something had been deleted upon cleanup, until I remembered I really wasn't dealing with a flaky bash script :) As a small aside, the fact that AEAD is not usable for integrity-only contexts makes logical sense; "Authenticated and Resilient Disk Encryption" (final.pdf) describes how AEAD implements both encryption and integrity protection (I am yet to understand how it does so in a length-preserving manner :) ) so obviously trying to use that algorithm in an integrity-only context would fail. I observe that searching the same PDF for "cmac-aes" returns 0 results, perhaps because this mode was implemented after the paper was published. *** References consulted before emailing, in order of descending relevance: - https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt - https://gitlab.com/cryptsetup/cryptsetup/-/wikis/DMIntegrity - https://is.muni.cz/th/vesfr/final.pdf (this was very interesting to read) - https://archive.fosdem.org/2018/schedule/event/cryptsetup/attachments/slides/2506/export/events/attachments/cryptsetup/slides/2506/fosdem18_cryptsetup_aead.pdf - https://crypto.stackexchange.com/questions/56429/which-algorithm-has-better-performance-hmac-umac-and-poly1305 - https://securitypitfalls.wordpress.com/2018/05/08/raid-doesnt-work/ - https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb - https://github.com/torvalds/linux/blob/master/drivers/md/dm-integrity.c - https://wiki.gentoo.org/wiki/Device-mapper#Integrity (this is currently terribly out of date) - https://dm-devel.redhat.narkive.com/3zjEiVPz/dmitry-kasatkin-huawei-com - https://security.stackexchange.com/questions/190670/luks2-dm-integrity - https://en.wikipedia.org/wiki/HMAC (this is very inscrutable and doesn't enlighten much...) - https://security.stackexchange.com/questions/79577/whats-the-difference-between-hmac-sha256key-data-and-sha256key-data/79581 - https://security.stackexchange.com/questions/135936/finding-hash-collision - https://github.com/torvalds/linux/blob/master/crypto/hmac.c - https://github.com/mbroz/cryptsetup/blob/master/src/integritysetup.c - https://github.com/mbroz/cryptsetup/blob/master/lib/integrity/integrity.c - https://github.com/mbroz/cryptsetup/blob/master/lib/utils.c - https://github.com/mbroz/cryptsetup/blob/master/lib/utils_crypt.c - https://github.com/mbroz/cryptsetup/blob/master/src/utils_tools.c - https://github.com/mbroz/cryptsetup/blob/master/lib/setup.c - https://github.com/mbroz/cryptsetup/blob/master/lib/libcryptsetup.h - https://github.com/torvalds/linux/blob/master/drivers/md/dm-crypt.c *** Thanks very much for implementing the dm-integrity target, and for introducing novel, universal silent data corruption detection to Linux's block layer in a pluggable way and making it usable independently of LUKS; it allows virtually all current and future Linux filesystems, dm-raid and LVM configurations, and software that works with block devices, to be informed of offline tampering and hardware failure by simply checking for -EILSEQ. I look forward to the day major distribution installers offer zero-effort opt-in to integrity protection and enable mainstream, enterprise and embedded Linux users everywhere to regularly contribute significantly to the "storage media reports healthy but just returned wrong data" statistics. Ideally, real and significant progress will be made in this area within the next 10 years. David Lindsay _______________________________________________ dm-crypt mailing list dm-crypt@xxxxxxxx https://www.saout.de/mailman/listinfo/dm-crypt