Re: LUKS header recovery attempt, bruteforce detection of bit errors

protagonist <listdump@xxxxxxxxxxxxxxxxxxxx> · Sat, 22 Apr 2017 20:02:27 +0200

On 22.04.2017 15:45, Arno Wagner wrote:
> On Sat, Apr 22, 2017 at 15:33:28 CEST, Robert Nichols wrote:
>> On 04/21/2017 07:25 PM, Arno Wagner wrote:
>>> Aassume 1 bit has been corrupted in a random place.
>>> A key-slot is 256kB, i.e. 2Mbit. That means trying it
>>> out (flip one bit, do an unlock attempt) would take
>>> 2 million seconds on the original PC, i.e. 23 days.
>>> This can maybe be brought down by a factor of 5 or so
>>> with the fastest avaliable CPU (the oteration count of
>>> 150k is pretty low), i.e. still roughly 5 days.
>>>
>>> This may be worth giving it a try, but it requires some
>>> serious coding with libcryptsetup and it will only
>>> help on a single bit-error.
>>>
>>> It may of course be a more complex error, especially
>>> when ECC in the disk has corrected an error to the
>>> wrong value, because the original was too corrupted.
>>
>> The drive would almost certainly have detected and corrected a single-bit
>> error.
> 
> Only when the error happened in FLASH. It can happen in 
> RAM and on a bus and there it would not have been corrected.
> Can even be a transient error (charged cosmic particle
> impacting a RAM cell, e.g.), these things happen.

I agree: there is at least the remote possibility that data wasn't
adequately protected "during flight", such as during repair operations.

If the device really is telling the truth about it's internal operations
and there has only been a single sector reallocation event on the
physical layer, the chances of this happening within our relevant 2MBit
area on a 240GB disk are very slim for a start, and the lack of writes
anywhere near this area makes flash wearout extra unlikely, as described
before.
Yet if there was such an error, it would fit perfectly to our scenario,
and it's not impossible that there have been many errors that have
happened unnoticed and unreported in other, less critical parts of the disk.

One thing to mention, perhaps for later readers with similar recovery
problems: if we knew the exact "logical" sector that was reallocated and
is thought to contain a flaw, we could limit our keyslot-bruteforcing
efforts to this area and reduce our efforts by factor 500 (for a MK size
of 512 bits) for a given operation. Unfortunately, I don't see a way to
get the positions of reallocated sector(s) out of the drive. There is no
known serial debug header for low-level firmware access available, as it
is sometimes the case with HDDs. Perhaps there is an unofficial vendor
tool by Intel with verbose debugging output that can accomplish this,
but the available documentation for the Intel SSD Toolbox doesn't
mention anything beyond the SMART values (that only include error
counts), which is why I haven't bothered with this proprietary tool yet.

If I had actual bad blocks, the "long" SMART-scan would have shown
positions, and there might be the option to hammer the drive with the
badblocks utility to find the position of (further) defective spots,
similar to the memtest86+ utility for main memory. Given the current
health reads, I don't really think such a treatment will discover any
"helpful" clues for misbehavior in the first 2MB of the drive.
The march 2015 techreport.com test on SSD endurance suggests this drive
model+capacity can take about ~600TB+ of writes before the reserved
sectors for wear leveling run out. I doubt heavy write loads will show
the issue we're looking for.

-----

I've spent some time researching likely bruteforce performance on
available machines and possible shortcuts.

Our bruteforce efforts are generally limited by the pbkdf2 sha1
iteration speed, which is hard to optimize for, just as designed.
On an Intel Ivy Bridge i7 and cryptsetup 1.7.3 (Debian Stretch), the
"cryptsetup benchmark" counts about 1.25M iterations/s:
PBKDF2-sha1      1248304 iterations per second for 256-bit key

I've manually compiled https://github.com/mbroz/pbkdf2_cryptsetup_test
as well as cryptsetup itself to find possible improvements with
different crypto backends, gcc optimizations such as -Ofast and
-march=native, but I've been unsuccessful to improve on the 1.25M/s
number so far. Openssl beats gcrypt, but the default kernel backend
still seems faster.
(As a side note: what's up with the 9 years old INSTALL? It's no fun
scraping together the necessary libraries while deciphering
autogen.sh/configure errors!)
There might still be some side channel mitigations in the actual
implementation that could be omitted, as we don't really care about
that, but from what I read, this is much less of a low-hanging fruit for
SHA1 than it would be for AES. Also, the amount of data handled during
these iterations is small, which decreases the impact of optimized
memory access methods (cpu cache level hints) and negates others
(hugepages mmap).

Now, according to my current understanding, bruteforcing any meta-header
values or actual key slot password requires us to iterate through the
following:
 * the key slot data (AF stripes) is kept fixed
 1) perform 144306 PBKDF2-sha1 iterations
 2) "decrypt" the AF sections using the resulting hash
 3) do a AFmerge() to get the raw masterKeyCandidate
 4) perform 35125 PBKDF2-sha1 iterations (at 160bit 'key' size) to
derive the digest of the masterKey
 5) memcmp comparison against the digest to see if we've got a valid
master key.

See
https://gitlab.com/cryptsetup/cryptsetup/blob/master/docs/on-disk-format.pdf
for algorithm pseudocode.

Please correct me if I'm wrong, but while looking at the second case of
bruteforcing some bit changes in the key slot data, I've noticed that
things are likely 5 times faster:
 1) the key slot data (AF stripes) is changed in cache/RAM
 * no meta-header value has changed, so a pre-computed version of the
144306 PBKDF2-sha1 iterations can be used. Keyslot iteration count is
therefore largely here irrelevant.
 2) "decrypt" the AF sections using the resulting hash
 3) do an AFmerge to get the raw masterKeyCandidate
 4) perform 35125 PBKDF2-sha1 iterations (at 160bit 'key' size) to
derive the digest of the masterKey
 5) memcmp comparison against the digest to see if we've got a valid
master key.

This general speedup in the second case is irrelevant for attacks
against password or salt of a key slot, but might help significantly in
our case, as performance might increase by a factor of five to something
in the order of 35 tries per second & core for pure sha1 iteration
performance (if we disregard cost of all other computations). If this
scales well to all four real cores or even all logical eight with the
available hyperthreading, then overall runtime for a single-bit test
might be well within a day.

One could go even further and replace the time-consuming steps 5) and 6)
with a routine that decrypts an encrypted part of the disk with the
masterKeyCandidate and compares it to a known plaintext (or a more
elaborate heuristic, such as several entropy checks looking for
"correctly" decryptable areas on disk leading to low-entropy output),
which might be a lot faster given AES-NI support and AES-xts throughput
speeds, but since we don't know much about actual disk content, this
seems to be too much of a gamble to be worth the effort at this point.
It's easily thinkable one tests against several areas containing a
well-compressed movie or a similarly high-entropy area, leading to a
false-negative and missing the only correct master key.

There is also another shortcut I can think of: checking for multi-byte
errors in the MK digest is as easy as doing one default cryptsetup run
with the correct password + unmodified key slot and comparing the
computed MK digest to the value on disk.

For this, one can patch the LUKS_verify_volume_key() function in
lib/luks1/keymanage.c to print the buffer values before the memcmp() call:
LUKS_verify_volume_key checkHashBuf:
e6 8e 79 bf a5 51 cc fe 7d 12 4c 4c 8d 46 d3 6c ae 30 0d 28
LUKS_verify_volume_key hdr->mkDigest:
ff 5c 64 48 bc 1f b2 f2 66 23 d3 66 38 41 c9 60 8a 7e de 0a

As one can see, the hdr->mkDigest is the MK digest output on disk read
in a similar fashion to the luksDump output noted in my last post. The
master key digest of the key candidate derived with the current keyslot
data and supposedly correct password does not match in even one byte
position (!), ruling out a partially corrupted MK digest field in the
meta header data.

Regards,
protagonist
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt