Re: [PATCH pahole 2/5] btf_loader: adjust negative bitfield offsets early on

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Sat, 16 Mar 2019 21:41:35 -0700

On Thu, Mar 14, 2019 at 2:01 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Mar 14, 2019 at 1:22 PM Mark Wielaard <mark@xxxxxxxxx> wrote:
> >
> > On Thu, 2019-03-14 at 12:56 -0700, Andrii Nakryiko wrote:
> > > On Thu, Mar 14, 2019 at 12:44 PM Arnaldo Carvalho de Melo
> > > <arnaldo.melo@xxxxxxxxx> wrote:
> > > >
> > > > But, in http://dwarfstd.org/doc/Dwarf3.pdf, page 75, we have:
> > > >
> > > > <quote>
> > > >
> > > > If the data member entry describes a bit field, then that entry has the
> > > > following attributes:
> > > >
> > > > - A DW_AT_byte_size attribute whose value (see Section 2.19) is the
> > > >   number of bytes that contain an instance of the bit field and any
> > > >   padding bits.  The byte size attribute may be omitted if the size of
> > > >   the object containing the bit field can be inferred from the type
> > > >   attribute of the data member containing the bit field.
> > > >
> > > > - A DW_AT_bit_offset attribute whose value (see Section 2.19) is the
> > > >   number of bits to the left of the leftmost (most significant) bit of
> > > >   the bit field value.
> > > >
> > > > - A DW_AT_bit_size attribute whose value (see Section 2.19) is the
> > > >   number of bits occupied by the bit field value.  The location
> > > >   description for a bit field calculates the address of an anonym ous
> > > >   object containing the bit field. The address is relati ve to the
> > > >   structure, union, or cla ss that most closely encloses the bit field
> > > >   declaration. The number of bytes in this anonymous object is the value
> > > >   of the byte size attribute of the bit field. The offset (in bits) fr
> > > >   om the most significant bit of the anonymous object to the most
> > > >   significant bit of the bit field is the value of the bit offset
> > > >   attribute.
> > > >
> > > > And following it there is an example with some tables, I'll read this
> > > > more thorougly later.
> > >
> > >
> > > Thanks! I'll meditate on that as well later today :)
> >
> > I haven't meditated on it yet, but would warn about using the now
> > ancient DWARF3 spec for this. See in particular the following DWARF
> > issue "Packed unaligned bit fields" resolved for DWARF4:
> > http://dwarfstd.org/ShowIssue.php?issue=081130.1
> >
> > You might even just want to see what DWARF5 says about it:
> > http://dwarfstd.org/doc/DWARF5.pdf
>
> Thanks, Mark! Newer standard is indeed a bit clearer:
>
> <quote>
>
> This Standard uses the following bit numbering and direction
> conventions in examples.
> These conventions are for illustrative purposes and other conventions
> may apply on
> particular architectures.
>
> - For big-endian architectures, bit offsets are counted from
> high-order to low-order
> bits within a byte (or larger storage unit); in this case, the bit
> offset identifies the
> high-order bit of the object.
> - For little-endian architectures, bit offsets are counted from
> low-order to high-order
> bits within a byte (or larger storage unit); in this case, the bit
> offset identifies the
> low-order bit of the object.
>
> In either case, the bit so identified is defined as the beginning of the object.
>
> </quote>
>
> Will go over all those calculations again today-tomorrow, while I have
> all the context from yesterday debugging session still fresh in my
> head.

There is a lot to meditate about :) DWARF 4/5 standard is pretty clear
about this example:

struct S {
        int j:5;
        int k:6;
        int m:5;
        int n:8;
};

According to DWARF standard (p90 for DWARF4), both little-endian and
big-endian archs should have the following bit offsets:
j:0
k:5
m:11
n:16

In practice, for big-endian aarch64 binary, emitted by gcc, it is like
it should (j:0, k:5, m:11, n:16).

For little-endian x86_64, both clang and gcc emit the following bit offsets:
j:27
k:21
m:16
n:8

Same is emitted by gcc for little-endian aarch64 target. So it's
sizeof(base type) - <real bit offset> - <bit size> for little-endian.
This means that pahole has to care about endianness of DWARF and make
according corrections.

I also compiled and disassembled this test program, to check that j
will actually take 5 lowest bits. And it does:

$ cat dwarf_test.c
struct S {
        int j : 5;
        int k : 6;
        int m : 5;
        int n : 8;
};

int main() {
        struct S s;
        s.j = 1;
        s.k = 2;
        s.m = 3;
        s.n = 4;
        return 0;
}
$ gcc -g dwarf_test.c -o dwarf_test.gcc
$ objdump -S dwarf_test.gcc
<snip>
int main() {
  4004b2:       55                      push   %rbp
  4004b3:       48 89 e5                mov    %rsp,%rbp
        struct S s;
        s.j = 1;
  4004b6:       0f b6 45 fc             movzbl -0x4(%rbp),%eax
  4004ba:       83 e0 e0                and    $0xffffffe0,%eax
<------ clear out 5 lowest bits
  4004bd:       83 c8 01                or     $0x1,%eax
  4004c0:       88 45 fc                mov    %al,-0x4(%rbp)
        s.k = 2;
  4004c3:       0f b7 45 fc             movzwl -0x4(%rbp),%eax
  4004c7:       66 25 1f f8             and    $0xf81f,%ax
<------ clear bits 5-10 (0xf81f = 1111 1000 0001 1111)
  4004cb:       83 c8 40                or     $0x40,%eax
<------ 0x40 = 2 << 5
  4004ce:       66 89 45 fc             mov    %ax,-0x4(%rbp)
        s.m = 3;
  4004d2:       0f b6 45 fd             movzbl -0x3(%rbp),%eax
  4004d6:       83 e0 07                and    $0x7,%eax
  4004d9:       83 c8 18                or     $0x18,%eax
  4004dc:       88 45 fd                mov    %al,-0x3(%rbp)
        s.n = 4;
  4004df:       c6 45 fe 04             movb   $0x4,-0x2(%rbp)
        return 0;
  4004e3:       b8 00 00 00 00          mov    $0x0,%eax
}
  4004e8:       5d                      pop    %rbp
  4004e9:       c3                      retq
  4004ea:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)