On 1/14/19 10:13 AM, Arnaldo Carvalho de Melo wrote: > Em Mon, Jan 14, 2019 at 04:55:42PM +0000, Yonghong Song escreveu: >> On 1/14/19 6:27 AM, Arnaldo Carvalho de Melo wrote: >>> Em Sat, Jan 12, 2019 at 12:00:42AM +0000, Yonghong Song escreveu: >>>> On 1/10/19 3:03 PM, Andrii Nakryiko wrote: >>>>> There appears to be problems with bitfields in both DWARF data and >>>>> pahole's conversion from DWARF to BTF (see test file source code and >>>>> pahole output at the bottom). >>> >>>>> 1. pahole believes unpacked.y1 has offset 7, which is incorrect >>>>> (should be 32?). >>>>> unpacked.y2 has offset 32, which also doesn't make sense. >>>>> DWARF info for unpacked.y1 and unpacked.y2 also seem wrong (how >>>>> do we get offset 18 for y1?). >>> >>>>> <2><63>: Abbrev Number: 3 (DW_TAG_member) >>>>> <64> DW_AT_name : y1 >>>>> <67> DW_AT_decl_file : 1 >>>>> <68> DW_AT_decl_line : 5 >>>>> <69> DW_AT_type : <0x87> >>>>> <6d> DW_AT_byte_size : 4 >>>>> <6e> DW_AT_bit_size : 7 >>>>> <6f> DW_AT_bit_offset : 18 >>>>> <70> DW_AT_data_member_location: 0 >>>>> <2><71>: Abbrev Number: 3 (DW_TAG_member) >>>>> <72> DW_AT_name : y2 >>>>> <75> DW_AT_decl_file : 1 >>>>> <76> DW_AT_decl_line : 6 >>>>> <77> DW_AT_type : <0x87> >>>>> <7b> DW_AT_byte_size : 4 >>>>> <7c> DW_AT_bit_size : 20 >>>>> <7d> DW_AT_bit_offset : 12 >>>>> <7e> DW_AT_data_member_location: 4 >>>>> >>>>> >>>>> 2. Similar problems with packed.y1 and packed.y2. packed.y2 offset >>>>> is negative value even in DWARF: >>>>> >>>>> <2><d2>: Abbrev Number: 6 (DW_TAG_member) >>>>> <d3> DW_AT_name : y2 >>>>> <d6> DW_AT_decl_file : 1 >>>>> <d7> DW_AT_decl_line : 14 >>>>> <d8> DW_AT_type : <0x87> >>>>> <dc> DW_AT_byte_size : 4 >>>>> <dd> DW_AT_bit_size : 20 >>>>> <de> DW_AT_bit_offset : -2 >>>>> <df> DW_AT_data_member_location: 0 >>>> >>>> Did some research on why we got -2 as bit_offset with the above packed >>>> data structure. >>>> >>>> First, the compiler generates correct thing. the negative >>>> DW_AT_bit_offset is allowed for dwarf2, but not dwarf4. >>> >>>> The "-2" here means (1). the member is crossing two >>>> adjacent "4" bytes boundary (DW_AT_byte_size = 4), and >>> >>> Ok, but I would expect that the DW_AT_data_member_location would be 4, >>> meaning, it is mostly on this one but starts two bits before. So when >> >> Unfortunately, this is not dwarf interpretation for little endian. >> (I wished it be as easy as big endian...). >> >> -bash-4.4$ cat t.c >> struct t { >> int a:3; >> int b:29; >> } g; >> -bash-4.4$ gcc -O2 -g -c t.c >> >> -bash-4.4$ llvm-dwarfdump t.o >> ... >> >> 0x00000027: DW_TAG_member >> DW_AT_name ("a") >> DW_AT_decl_file ("/home/yhs/tmp2/t.c") >> DW_AT_decl_line (2) >> DW_AT_type (0x00000042 "int") >> DW_AT_byte_size (0x04) >> DW_AT_bit_size (0x03) >> DW_AT_bit_offset (0x1d) >> DW_AT_data_member_location (0x00) >> >> 0x00000034: DW_TAG_member >> DW_AT_name ("b") >> DW_AT_decl_file ("/home/yhs/tmp2/t.c") >> DW_AT_decl_line (3) >> DW_AT_type (0x00000042 "int") >> DW_AT_byte_size (0x04) >> DW_AT_bit_size (0x1d) >> DW_AT_bit_offset (0x00) >> DW_AT_data_member_location (0x00) >> >> The DW_AT_data_member_location is 0 for both members. >> The first member offset 0x1d is from the byte_size boundary (4 bytes). >> The second member offset 0x0 is also from the byte_size boundary (4 bytes). >> >> For the above packed case, it merely mean that >> the offset -2 from the byte_size_boundary (4 bytes), but instead go into >> the byte, rather going opposite directions with 2 bits. >> >> An alternative way to represent with data_member_location 4, but >> bit_offset will have to change. >> >> old way: >> >>> <dc> DW_AT_byte_size : 4 >> >>> <dd> DW_AT_bit_size : 20 >> >>> <de> DW_AT_bit_offset : -2 >> >>> <df> DW_AT_data_member_location: 0 >> new way: >> <dc> DW_AT_byte_size : 4 >> <dd> DW_AT_bit_size : 20 >> <de> DW_AT_bit_offset : 30 >> <df> DW_AT_data_member_location: 4 >> >> The compiler did not generate the new way. > > Ok, but what do you think should be the best way to represent this in > pahole's output, keep it like a negative number so that users, curious > about what that means go the man page and look up what negative bit > offsets mean, or show it as 30? I.e. apply this patch: > > > diff --git a/dwarves_fprintf.c b/dwarves_fprintf.c > index bd46f97916fc..cdb8bef56f72 100644 > --- a/dwarves_fprintf.c > +++ b/dwarves_fprintf.c > @@ -807,9 +807,14 @@ static size_t class_member__fprintf(struct class_member *member, bool union_memb > offset); > > if (member->bitfield_size != 0) { > + unsigned int bitfield_offset = member->bitfield_offset; > + > + if (member->bitfield_offset < 0) > + bitfield_offset = member->byte_size * 8 + member->bitfield_offset; > + > printed += fprintf(fp, sconf.hex_fmt ? > ":%#2x" : ":%2u", > - member->bitfield_offset); > + bitfield_offset); > size_spacing -= 3; > } > > Which ends up producing this output, that still has stuff to do wrt the > hole/padding calculations, but has the offset adjusted to that '30' > value instead of showing -1: > > [acme@quaco pahole]$ pahole examples/yonghong/packed_bitfield.o > struct packed { > char x1:1; /* 0: 7 1 */ > char x2:3; /* 0: 4 1 */ > char x3:3; /* 0: 1 1 */ > int y1:7; /* 0:18 4 */ > int y2:20; /* 0:30 4 */ The y2 should be "4:30"? This is how I interpret the above: > char x1:1; /* 0: 7 1 */ ==> this will convert to big endian offset 0, size 1 ==> offset = sizeof(char) - 7 - 1 > char x2:3; /* 0: 4 1 */ ===> this will convert to big endian offset 1, size 3 ===> offset = sizeof(char) - 4 - 3 > char x3:3; /* 0: 1 1 */ ===> this will convert to big endian offset 4, size 3 ===> offset = sizeof(char) - 1 - 3 > int y1:7; /* 0:18 4 */ ===> this will convert to big endian offset 7, size 7 ===> offset = sizeof(int) - 18 - 7 = 7 > int y2:20; /* 4:30 4 */ ===> this will convert to big endian offset 14, size 20 ===> offset = 4 * 8 + sizeof(int) - 30 - 20 = 14 > > /* size: 5, cachelines: 1, members: 5 */ > /* padding: 1 */ > /* bit_padding: 254 bits */ > /* last cacheline: 5 bytes */ > > /* BRAIN FART ALERT! 5 != 1 + 0(holes), diff = 4 */ > }; > [acme@quaco pahole]$ > > The following drawing _tries_ to paint this picture according to the > description you gave for negative bit_offsets, butthen it would be 8 > bytes, the size of the struct is 5 bytes, sizeof(char) + sizeof(int), > and since the sum of the bit sizes is 34 bits, we need 5 bytes. > > x x > + 3 + + 2 + x1 +---- y2 ---+ +-- y2 --+ > | | | | | | | | | > V V V V V V V V V > [0 1 2 3 4 5 6 7 ... 30 31][0 .. 17 18 ..... 24 .. 31] With packing, the elements layout may not be similar to the above. It probably likes below (I have a C program later): => memory increasing bits 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... 30 31 32 33 34 ... x1 | x2 | | x3 | | y1 | | y2 | In little representation, the following looks like the case: x1: byte0, bit 7 x2: byte0, bit 4 - 6 x3: byte0, bit 1 - 3 y1: byte0, bit 0 and byte1, bit 2 - 7. y2: byte1, bit 0 - 1, and byte 2, 3, and byte4 6 -7. So there is no wasted bits. In the above your diagram, there are some wasted bits. I didn't check exactly how the little endian DW attributes are computed inside the compiler for the packed case. But one way to do it is - do a big endian bit offset/size assignment first. This should be straight forward. - convert to the little endian offset based on the corresponding field type and big endian field offset/type. -bash-4.4$ cat t2.c #include <stdio.h> struct packed { char x1: 1; char x2: 3; char x3: 3; int y1: 7; int y2: 20; } __attribute__((packed)); int main() { struct packed p; int i; p.x1 = 1; p.x2 = 0; p.x3 = 2; p.y1 = 7 << 2; p.y2 = (1 << 18) + 3; for (i = 0; i < sizeof(p); i++) printf("i = %x\n", *(((unsigned char *)&p) + i)); return 0; } -bash-4.4$ gcc t2.c -bash-4.4$ ./a.out i = 21 i = ce i = 0 i = 0 i = fd -bash-4.4$ > > Doesn't make sense, we don't have 64 bits, just 40. > > Can you hand hold me here? :-) > >>> seeing negative DW_AT_bit_offsets we should think it starts in this >>> location (0 in this case), but straddles to the next, so I should say >>> that it is at bit offset (DW_AT_byte_size * 8 - DW_AT_bit_offset), the >>> user, looking at this new pahole output will realize that it has 2 bits >>> at the end of the current DW_AT_data_member_location (0) but has 18 bits >>> in the next. >>> >>> >>>> (2). the -2 means the the starting bit is at the "4" byte boundary - 2 >>>> bits. So basically it covers: >>>> >>>> | | >>>> V V >>>> [0 1 .....30 31][0 .. 17 18 ... 31] >>>> >>>> totally 20 bits. >>>> >>>> The following patch fixed the issue: >>>> -bash-4.4$ git diff >>>> diff --git a/dwarves.h b/dwarves.h >>>> index e6bffe8..e5f8347 100644 >>>> --- a/dwarves.h >>>> +++ b/dwarves.h >>>> @@ -831,7 +831,7 @@ struct class_member { >>>> uint32_t bit_size; >>>> uint32_t byte_offset; >>>> size_t byte_size; >>>> - uint8_t bitfield_offset; >>>> + int8_t bitfield_offset; >>>> uint8_t bitfield_size; >>>> uint8_t bit_hole; >>>> uint8_t bitfield_end:1; >>>> -bash-4.4$ >>>> >>>> With the above change: >>>> -bash-4.4$ ~/work/github/pahole/build/pahole -JV bitfields.o >>>> File bitfields.o: >>>> [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED >>>> [2] STRUCT packed kind_flag=1 size=5 vlen=5 >>>> x1 type_id=3 bitfield_size=1 bits_offset=0 >>>> x2 type_id=3 bitfield_size=3 bits_offset=1 >>>> x3 type_id=3 bitfield_size=3 bits_offset=4 >>>> y1 type_id=1 bitfield_size=7 bits_offset=7 >>>> y2 type_id=1 bitfield_size=20 bits_offset=14 >>>> [3] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none) >>>> [4] STRUCT unpacked kind_flag=1 size=8 vlen=5 >>>> x1 type_id=3 bitfield_size=1 bits_offset=0 >>>> x2 type_id=3 bitfield_size=3 bits_offset=1 >>>> x3 type_id=3 bitfield_size=3 bits_offset=4 >>>> y1 type_id=1 bitfield_size=7 bits_offset=7 >>>> y2 type_id=1 bitfield_size=20 bits_offset=32 >>>> [5] UNION packed_union kind_flag=0 size=8 vlen=4 >>>> x1 type_id=3 bits_offset=0 >>>> y2 type_id=1 bits_offset=0 >>>> s1 type_id=2 bits_offset=0 >>>> s2 type_id=4 bits_offset=0 >>>> [6] UNION unpacked_union kind_flag=0 size=8 vlen=3 >>>> x1 type_id=3 bits_offset=0 >>>> y2 type_id=1 bits_offset=0 >>>> s type_id=4 bits_offset=0 >>>> -bash-4.4$ >>>> >>>> Will send out a patch shortly. >>>> >>>>> >>>>> andriin@devvm$ cat bitfields.c >>>>> struct unpacked{ >>>>> char x1: 1; >>>>> char x2: 3; >>>>> char x3: 3; >>>>> int y1: 7; >>>>> int y2: 20; >>>>> }; >>>>> >>>>> struct packed{ >>>>> char x1: 1; >>>>> char x2: 3; >>>>> char x3: 3; >>>>> int y1: 7; >>>>> int y2: 20; >>>>> } __attribute__((packed)); >>>>> >>>>> union packed_union { >>>>> char x1; >>>>> int y2; >>>>> struct packed s1; >>>>> struct unpacked s2; >>>>> }; >>>>> >>>>> union unpacked_union { >>>>> char x1; >>>>> int y2; >>>>> struct unpacked s; >>>>> } __attribute__((packed)); >>>>> >>>>> int main() { >>>>> struct packed s1; >>>>> struct unpacked s2; >>>>> union packed_union u1; >>>>> union unpacked_union u2; >>>>> return 0; >>>>> } >>>>> andriin@devvm$ cc -g bitfields.c -o bitfields >>>>> andriin@devvm$ LLVM_OBJCOPY=objcopy ~/local/pahole/build/pahole -JV bitfields >>>>> File bitfields: >>>>> [1] STRUCT unpacked kind_flag=1 size=8 vlen=5 >>>>> x1 type_id=2 bitfield_size=1 bits_offset=0 >>>>> x2 type_id=2 bitfield_size=3 bits_offset=1 >>>>> x3 type_id=2 bitfield_size=3 bits_offset=4 >>>>> y1 type_id=3 bitfield_size=7 bits_offset=7 >>>>> y2 type_id=3 bitfield_size=20 bits_offset=32 >>>>> [2] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none) >>>>> [3] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED >>>>> [4] STRUCT packed kind_flag=1 size=5 vlen=5 >>>>> x1 type_id=2 bitfield_size=1 bits_offset=0 >>>>> x2 type_id=2 bitfield_size=3 bits_offset=1 >>>>> x3 type_id=2 bitfield_size=3 bits_offset=4 >>>>> y1 type_id=3 bitfield_size=7 bits_offset=7 >>>>> y2 type_id=3 bitfield_size=255 bits_offset=16776974 >>>>> [5] UNION packed_union kind_flag=0 size=8 vlen=4 >>>>> x1 type_id=2 bits_offset=0 >>>>> y2 type_id=3 bits_offset=0 >>>>> s1 type_id=4 bits_offset=0 >>>>> s2 type_id=1 bits_offset=0 >>>>> [6] UNION unpacked_union kind_flag=0 size=8 vlen=3 >>>>> x1 type_id=2 bits_offset=0 >>>>> y2 type_id=3 bits_offset=0 >>>>> s type_id=1 bits_offset=0 >>>>> andriin@devvm$ readelf -wi bitfields >>>>> Contents of the .debug_info section: >>>>> >>>>> ... >>>>> >>>>> <1><2d>: Abbrev Number: 2 (DW_TAG_structure_type) >>>>> <2e> DW_AT_name : (indirect string, offset: 0x4e): unpacked >>>>> <32> DW_AT_byte_size : 8 >>>>> <33> DW_AT_decl_file : 1 >>>>> <34> DW_AT_decl_line : 1 >>>>> <35> DW_AT_sibling : <0x80> >>>>> <2><39>: Abbrev Number: 3 (DW_TAG_member) >>>>> <3a> DW_AT_name : x1 >>>>> <3d> DW_AT_decl_file : 1 >>>>> <3e> DW_AT_decl_line : 2 >>>>> <3f> DW_AT_type : <0x80> >>>>> <43> DW_AT_byte_size : 1 >>>>> <44> DW_AT_bit_size : 1 >>>>> <45> DW_AT_bit_offset : 7 >>>>> <46> DW_AT_data_member_location: 0 >>>>> <2><47>: Abbrev Number: 3 (DW_TAG_member) >>>>> <48> DW_AT_name : x2 >>>>> <4b> DW_AT_decl_file : 1 >>>>> <4c> DW_AT_decl_line : 3 >>>>> <4d> DW_AT_type : <0x80> >>>>> <51> DW_AT_byte_size : 1 >>>>> <52> DW_AT_bit_size : 3 >>>>> <53> DW_AT_bit_offset : 4 >>>>> <54> DW_AT_data_member_location: 0 >>>>> <2><55>: Abbrev Number: 3 (DW_TAG_member) >>>>> <56> DW_AT_name : x3 >>>>> <59> DW_AT_decl_file : 1 >>>>> <5a> DW_AT_decl_line : 4 >>>>> <5b> DW_AT_type : <0x80> >>>>> <5f> DW_AT_byte_size : 1 >>>>> <60> DW_AT_bit_size : 3 >>>>> <61> DW_AT_bit_offset : 1 >>>>> <62> DW_AT_data_member_location: 0 >>>>> <2><63>: Abbrev Number: 3 (DW_TAG_member) >>>>> <64> DW_AT_name : y1 >>>>> <67> DW_AT_decl_file : 1 >>>>> <68> DW_AT_decl_line : 5 >>>>> <69> DW_AT_type : <0x87> >>>>> <6d> DW_AT_byte_size : 4 >>>>> <6e> DW_AT_bit_size : 7 >>>>> <6f> DW_AT_bit_offset : 18 >>>>> <70> DW_AT_data_member_location: 0 >>>>> <2><71>: Abbrev Number: 3 (DW_TAG_member) >>>>> <72> DW_AT_name : y2 >>>>> <75> DW_AT_decl_file : 1 >>>>> <76> DW_AT_decl_line : 6 >>>>> <77> DW_AT_type : <0x87> >>>>> <7b> DW_AT_byte_size : 4 >>>>> <7c> DW_AT_bit_size : 20 >>>>> <7d> DW_AT_bit_offset : 12 >>>>> <7e> DW_AT_data_member_location: 4 >>>>> <2><7f>: Abbrev Number: 0 >>>>> <1><80>: Abbrev Number: 4 (DW_TAG_base_type) >>>>> <81> DW_AT_byte_size : 1 >>>>> <82> DW_AT_encoding : 6 (signed char) >>>>> <83> DW_AT_name : (indirect string, offset: 0xb3): char >>>>> <1><87>: Abbrev Number: 5 (DW_TAG_base_type) >>>>> <88> DW_AT_byte_size : 4 >>>>> <89> DW_AT_encoding : 5 (signed) >>>>> <8a> DW_AT_name : int >>>>> <1><8e>: Abbrev Number: 2 (DW_TAG_structure_type) >>>>> <8f> DW_AT_name : (indirect string, offset: 0x50): packed >>>>> <93> DW_AT_byte_size : 5 >>>>> <94> DW_AT_decl_file : 1 >>>>> <95> DW_AT_decl_line : 9 >>>>> <96> DW_AT_sibling : <0xe1> >>>>> <2><9a>: Abbrev Number: 3 (DW_TAG_member) >>>>> <9b> DW_AT_name : x1 >>>>> <9e> DW_AT_decl_file : 1 >>>>> <9f> DW_AT_decl_line : 10 >>>>> <a0> DW_AT_type : <0x80> >>>>> <a4> DW_AT_byte_size : 1 >>>>> <a5> DW_AT_bit_size : 1 >>>>> <a6> DW_AT_bit_offset : 7 >>>>> <a7> DW_AT_data_member_location: 0 >>>>> <2><a8>: Abbrev Number: 3 (DW_TAG_member) >>>>> <a9> DW_AT_name : x2 >>>>> <ac> DW_AT_decl_file : 1 >>>>> <ad> DW_AT_decl_line : 11 >>>>> <ae> DW_AT_type : <0x80> >>>>> <b2> DW_AT_byte_size : 1 >>>>> <b3> DW_AT_bit_size : 3 >>>>> <b4> DW_AT_bit_offset : 4 >>>>> <b5> DW_AT_data_member_location: 0 >>>>> <2><b6>: Abbrev Number: 3 (DW_TAG_member) >>>>> <b7> DW_AT_name : x3 >>>>> <ba> DW_AT_decl_file : 1 >>>>> <bb> DW_AT_decl_line : 12 >>>>> <bc> DW_AT_type : <0x80> >>>>> <c0> DW_AT_byte_size : 1 >>>>> <c1> DW_AT_bit_size : 3 >>>>> <c2> DW_AT_bit_offset : 1 >>>>> <c3> DW_AT_data_member_location: 0 >>>>> <2><c4>: Abbrev Number: 3 (DW_TAG_member) >>>>> <c5> DW_AT_name : y1 >>>>> <c8> DW_AT_decl_file : 1 >>>>> <c9> DW_AT_decl_line : 13 >>>>> <ca> DW_AT_type : <0x87> >>>>> <ce> DW_AT_byte_size : 4 >>>>> <cf> DW_AT_bit_size : 7 >>>>> <d0> DW_AT_bit_offset : 18 >>>>> <d1> DW_AT_data_member_location: 0 >>>>> <2><d2>: Abbrev Number: 6 (DW_TAG_member) >>>>> <d3> DW_AT_name : y2 >>>>> <d6> DW_AT_decl_file : 1 >>>>> <d7> DW_AT_decl_line : 14 >>>>> <d8> DW_AT_type : <0x87> >>>>> <dc> DW_AT_byte_size : 4 >>>>> <dd> DW_AT_bit_size : 20 >>>>> <de> DW_AT_bit_offset : -2 >>>>> <df> DW_AT_data_member_location: 0 >>>>> <2><e0>: Abbrev Number: 0 >>>>> >>>>> ... >>>>> >>> >