Re: [BUG/PATCH] kernel RNG and its secrets

Stephan Mueller <smueller@xxxxxxxxxx> · Fri, 10 Apr 2015 15:25:44 +0200

Am Mittwoch, 18. März 2015, 12:09:45 schrieb Stephan Mueller:

Hi,

>Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
>
>Hi Daniel,
>
>>On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
>>> On Wed, Mar 18, 2015, at 10:53, mancha wrote:
>>>> Hi.
>>>> 
>>>> The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
>>>> protect
>>>> 
>>>> memory cleansing against things like dead store optimization:
>>>>     void memzero_explicit(void *s, size_t count)
>>>>     {
>>>>     
>>>>             memset(s, 0, count);
>>>>             OPTIMIZER_HIDE_VAR(s);
>>>>     
>>>>     }
>>>> 
>>>> OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
>>>> crypto_memneq>>
>>>> 
>>>> against timing analysis, is defined when using gcc as:
>>>>     #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0"
>>>>     (var))
>>>> 
>>>> My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc
>>>> from optimizing out memset (i.e. secrets remain in memory).
>>>> 
>>>> Two things that do work:
>>>>     __asm__ __volatile__ ("" : "=r" (var) : "0" (var))
>>> 
>>> You are correct, volatile signature should be added to
>>> OPTIMIZER_HIDE_VAR. Because we use an output variable "=r", gcc is
>>> allowed to check if it is needed and may remove the asm statement.
>>> Another option would be to just use var as an input variable - asm
>>> blocks without output variables are always considered being volatile
>>> by gcc.
>>> 
>>> Can you send a patch?
>>> 
>>> I don't think it is security critical, as Daniel pointed out, the
>>> call
>>> will happen because the function is an external call to the crypto
>>> functions, thus the compiler has to flush memory on return.
>>
>>Just had a look.
>>
>>$ gdb vmlinux
>>(gdb) disassemble memzero_explicit
>>
>>Dump of assembler code for function memzero_explicit:
>>    0xffffffff813a18b0 <+0>:	push   %rbp
>>    0xffffffff813a18b1 <+1>:	mov    %rsi,%rdx
>>    0xffffffff813a18b4 <+4>:	xor    %esi,%esi
>>    0xffffffff813a18b6 <+6>:	mov    %rsp,%rbp
>>    0xffffffff813a18b9 <+9>:	callq  0xffffffff813a7120 <memset>
>>    0xffffffff813a18be <+14>:	pop    %rbp
>>    0xffffffff813a18bf <+15>:	retq
>>
>>End of assembler dump.
>>
>>(gdb) disassemble extract_entropy
>>[...]
>>
>>    0xffffffff814a5000 <+304>:	sub    %r15,%rbx
>>    0xffffffff814a5003 <+307>:	jne    0xffffffff814a4f80
>>
>><extract_entropy+176> 0xffffffff814a5009 <+313>:	mov    %r12,%rdi
>>
>>    0xffffffff814a500c <+316>:	mov    $0xa,%esi
>>    0xffffffff814a5011 <+321>:	callq  0xffffffff813a18b0
>>
>><memzero_explicit> 0xffffffff814a5016 <+326>:	mov    -0x48(%rbp),%rax
>>[...]
>>
>>I would be fine with __volatile__.
>
>Are we sure that simply adding a __volatile__ works in any case? I just
>did a test with a simple user space app:
>
>static inline void memset_secure(void *s, int c, size_t n)
>{
>        memset(s, c, n);
>        //__asm__ __volatile__("": : :"memory");
>        __asm__ __volatile__("" : "=r" (s) : "0" (s));
>}
>
>int main(int argc, char *argv[])
>{
>#define BUFLEN 20
>        char buf[BUFLEN];
>
>        snprintf(buf, (BUFLEN - 1), "teststring\n");
>        printf("%s", buf);
>
>        memset_secure(buf, 0, BUFLEN);
>}
>
>When using the discussed code of __asm__ __volatile__("" : "=r" (s) :
>"0" (s));  I do not find the code implementing memset(0) in objdump.
>Only when I enable the memory barrier, I see the following (when
>compiling with -O2):
>
>objdump -d memset_secure:
>...
>0000000000400440 <main>:
>...
>  400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
>  400470:       00
>  400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
>  400478:       00 00
>  40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
>  400481:       00
>...

I would like to bring up that topic again as I did some more analyses:

For testing I used the following code:

static inline void memset_secure(void *s, int c, size_t n)
{
        memset(s, c, n);
	BARRIER
}

where BARRIER is defined as:

(1) __asm__ __volatile__("" : "=r" (s) : "0" (s));

(2) __asm__ __volatile__("": : :"memory");

(3) __asm__ __volatile__("" : "=r" (s) : "0" (s) : "memory");

I tested the code with gcc and clang, considering that there is effort 
underway to compile the kernel with clang too.

The following table marks an X when the aforementioned movq/movl code is 
present (or an invocation of memset@plt) in the object code (i.e. the code we 
want). Contrary the table marks - where the code is not present (i.e. the code 
we do not want):

         | BARRIER  | (1) | (2) | (3)
---------+----------+     |     |
Compiler |          |     |     |
=========+==========+==================
                    |     |     |
gcc -O0             |  X  |  X  |  X
                    |     |     |
gcc -O2             |  -  |  X  |  X
                    |     |     |
gcc -O3             |  -  |  X  |  X
                    |     |     |
clang -00           |  X  |  X  |  X
                    |     |     |
clang -02           |  X  |  -  |  X
                    |     |     |
clang -03           |  -  |  -  |  X

As the kernel is compiled with -O2, clang folks would still be left uncovered 
with the current solution (i.e. BARRIER option (2)).

Thus, may I propose to update the patch to use option (3) instead as (i) it 
does not cost anything extra on gcc and (ii) it covers clang too?

Ciao
Stephan 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html