Use uniform bit offset scheme as described in DWARF standard (though apparently not really followed by major compilers), in which bit offset is a natural extension of byte offset in both big- and little-endian architectures. BEFORE: 1. Bit offsets for little-endian are output as offsets from highest-order bit of underlying int to highest-order bit of bitfield, so double-backwards for little-endian arch and counter to how byte offsets are used, which point to lowest-order bit of underlying type. This makes first bitfield to have bit offset 27, instead of natural 0. 2. Bit offsets for big-endian are output as expected, by referencing highest-order bit offset from highest-order bit of underlying int. This is natural for big-endian platform, e.g., first bitfield has bit offset of 0. 3. Big-endian target also has problem with determining bit holes, because bit positions have to be calculated differently for little- and big-endian platforms and previous commit changed pahole logic to follow little-endian semantics. 4. BTF encoder outputs uniform bit offset for both little- and big-endian format (following DWARF's recommended bit offset scheme) 5. BTF loader, though, follows DWARF loader's format and outputs little-endian bit offsets "double-backwards". $ gcc -g dwarf_test.c -o dwarf_test $ pahole -F dwarf dwarf_test struct S { int j:5; /* 0:27 4 */ int k:6; /* 0:21 4 */ int m:5; /* 0:16 4 */ int n:8; /* 0: 8 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit_padding: 8 bits */ /* last cacheline: 4 bytes */ }; $ pahole -JV dwarf_test File dwarf_test: [1] STRUCT S kind_flag=1 size=4 vlen=4 j type_id=2 bitfield_size=5 bits_offset=0 k type_id=2 bitfield_size=6 bits_offset=5 m type_id=2 bitfield_size=5 bits_offset=11 n type_id=2 bitfield_size=8 bits_offset=16 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED $ pahole -F btf dwarf_test struct S { int j:5; /* 0:27 4 */ int k:6; /* 0:21 4 */ int m:5; /* 0:16 4 */ int n:8; /* 0: 8 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit_padding: 8 bits */ /* last cacheline: 4 bytes */ }; $ aarch64-linux-gnu-gcc -mbig-endian -g -c dwarf_test.c -o dwarf_test.be $ pahole -F dwarf dwarf_test.be struct S { /* XXX 27 bits hole, try to pack */ int j:5; /* 0: 0 4 */ /* XXX 245 bits hole, try to pack */ int k:6; /* 0: 5 4 */ /* XXX 245 bits hole, try to pack */ int m:5; /* 0:11 4 */ /* XXX 243 bits hole, try to pack */ int n:8; /* 0:16 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit holes: 4, sum bit holes: 760 bits */ /* bit_padding: 16 bits */ /* last cacheline: 4 bytes */ /* BRAIN FART ALERT! 4 bytes != 24 (member bits) + 0 (byte holes) + 760 (bit holes), diff = -768 bits */ }; $ pahole -JV dwarf_test.be File dwarf_test.be: [1] STRUCT S kind_flag=1 size=4 vlen=4 j type_id=2 bitfield_size=5 bits_offset=0 k type_id=2 bitfield_size=6 bits_offset=5 m type_id=2 bitfield_size=5 bits_offset=11 n type_id=2 bitfield_size=8 bits_offset=16 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED $ pahole -F btf dwarf_test.be struct S { /* XXX 27 bits hole, try to pack */ int j:5; /* 0: 0 4 */ /* XXX 245 bits hole, try to pack */ int k:6; /* 0: 5 4 */ /* XXX 245 bits hole, try to pack */ int m:5; /* 0:11 4 */ /* XXX 243 bits hole, try to pack */ int n:8; /* 0:16 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit holes: 4, sum bit holes: 760 bits */ /* bit_padding: 16 bits */ /* last cacheline: 4 bytes */ /* BRAIN FART ALERT! 4 bytes != 24 (member bits) + 0 (byte holes) + 760 (bit holes), diff = -768 bits */ }; AFTER: 1. Same output for little- and big-endian binaries, both for BTF and DWARF loader. 2. For little-endian target, bit offsets are natural extensions of byte offset, counting from lowest-order bit of underlying int to lowest-order bit of a bitfield. 3. BTF encoder still emits correct and natural bit offsets (for both binaries). 4. No more BRAIN FART ALERTs for big-endian. $ pahole -F dwarf dwarf_test struct S { int j:5; /* 0: 0 4 */ int k:6; /* 0: 5 4 */ int m:5; /* 0:11 4 */ int n:8; /* 0:16 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit_padding: 8 bits */ /* last cacheline: 4 bytes */ }; $ pahole -JV dwarf_test File dwarf_test: [1] STRUCT S kind_flag=1 size=4 vlen=4 j type_id=2 bitfield_size=5 bits_offset=0 k type_id=2 bitfield_size=6 bits_offset=5 m type_id=2 bitfield_size=5 bits_offset=11 n type_id=2 bitfield_size=8 bits_offset=16 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED $ pahole -F btf dwarf_test struct S { int j:5; /* 0: 0 4 */ int k:6; /* 0: 5 4 */ int m:5; /* 0:11 4 */ int n:8; /* 0:16 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit_padding: 8 bits */ /* last cacheline: 4 bytes */ }; $ pahole -F dwarf dwarf_test.be struct S { int j:5; /* 0: 0 4 */ int k:6; /* 0: 5 4 */ int m:5; /* 0:11 4 */ int n:8; /* 0:16 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit_padding: 8 bits */ /* last cacheline: 4 bytes */ }; $ pahole -JV dwarf_test.be File dwarf_test.be: [1] STRUCT S kind_flag=1 size=4 vlen=4 j type_id=2 bitfield_size=5 bits_offset=0 k type_id=2 bitfield_size=6 bits_offset=5 m type_id=2 bitfield_size=5 bits_offset=11 n type_id=2 bitfield_size=8 bits_offset=16 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED $ pahole -F btf dwarf_test.be struct S { int j:5; /* 0: 0 4 */ int k:6; /* 0: 5 4 */ int m:5; /* 0:11 4 */ int n:8; /* 0:16 4 */ /* size: 4, cachelines: 1, members: 4 */ /* bit_padding: 8 bits */ /* last cacheline: 4 bytes */ }; FOR REFERENCE. Relevant parts of DWARF output from GCC (clang outputs exactly the same data) for both little- and big-endian binaries: $ readelf -wi dwarf_test Contents of the .debug_info section: <snip> <1><2d>: Abbrev Number: 2 (DW_TAG_structure_type) <2e> DW_AT_name : S <30> DW_AT_byte_size : 4 <31> DW_AT_decl_file : 1 <32> DW_AT_decl_line : 1 <33> DW_AT_decl_column : 8 <34> DW_AT_sibling : <0x71> <2><38>: Abbrev Number: 3 (DW_TAG_member) <39> DW_AT_name : j <3b> DW_AT_decl_file : 1 <3c> DW_AT_decl_line : 2 <3d> DW_AT_decl_column : 6 <3e> DW_AT_type : <0x71> <42> DW_AT_byte_size : 4 <43> DW_AT_bit_size : 5 <44> DW_AT_bit_offset : 27 <45> DW_AT_data_member_location: 0 <2><46>: Abbrev Number: 3 (DW_TAG_member) <47> DW_AT_name : k <49> DW_AT_decl_file : 1 <4a> DW_AT_decl_line : 3 <4b> DW_AT_decl_column : 6 <4c> DW_AT_type : <0x71> <50> DW_AT_byte_size : 4 <51> DW_AT_bit_size : 6 <52> DW_AT_bit_offset : 21 <53> DW_AT_data_member_location: 0 <2><54>: Abbrev Number: 3 (DW_TAG_member) <55> DW_AT_name : m <57> DW_AT_decl_file : 1 <58> DW_AT_decl_line : 4 <59> DW_AT_decl_column : 6 <5a> DW_AT_type : <0x71> <5e> DW_AT_byte_size : 4 <5f> DW_AT_bit_size : 5 <60> DW_AT_bit_offset : 16 <61> DW_AT_data_member_location: 0 <2><62>: Abbrev Number: 3 (DW_TAG_member) <63> DW_AT_name : n <65> DW_AT_decl_file : 1 <66> DW_AT_decl_line : 5 <67> DW_AT_decl_column : 6 <68> DW_AT_type : <0x71> <6c> DW_AT_byte_size : 4 <6d> DW_AT_bit_size : 8 <6e> DW_AT_bit_offset : 8 <6f> DW_AT_data_member_location: 0 <2><70>: Abbrev Number: 0 <1><71>: Abbrev Number: 4 (DW_TAG_base_type) <72> DW_AT_byte_size : 4 <73> DW_AT_encoding : 5 (signed) <74> DW_AT_name : int <snip> $ readelf -wi dwarf_test.be Contents of the .debug_info section: <snip> <1><2d>: Abbrev Number: 2 (DW_TAG_structure_type) <2e> DW_AT_name : S <30> DW_AT_byte_size : 4 <31> DW_AT_decl_file : 1 <32> DW_AT_decl_line : 1 <33> DW_AT_sibling : <0x6c> <2><37>: Abbrev Number: 3 (DW_TAG_member) <38> DW_AT_name : j <3a> DW_AT_decl_file : 1 <3b> DW_AT_decl_line : 2 <3c> DW_AT_type : <0x6c> <40> DW_AT_byte_size : 4 <41> DW_AT_bit_size : 5 <42> DW_AT_bit_offset : 0 <43> DW_AT_data_member_location: 0 <2><44>: Abbrev Number: 3 (DW_TAG_member) <45> DW_AT_name : k <47> DW_AT_decl_file : 1 <48> DW_AT_decl_line : 3 <49> DW_AT_type : <0x6c> <4d> DW_AT_byte_size : 4 <4e> DW_AT_bit_size : 6 <4f> DW_AT_bit_offset : 5 <50> DW_AT_data_member_location: 0 <2><51>: Abbrev Number: 3 (DW_TAG_member) <52> DW_AT_name : m <54> DW_AT_decl_file : 1 <55> DW_AT_decl_line : 4 <56> DW_AT_type : <0x6c> <5a> DW_AT_byte_size : 4 <5b> DW_AT_bit_size : 5 <5c> DW_AT_bit_offset : 11 <5d> DW_AT_data_member_location: 0 <2><5e>: Abbrev Number: 3 (DW_TAG_member) <5f> DW_AT_name : n <61> DW_AT_decl_file : 1 <62> DW_AT_decl_line : 5 <63> DW_AT_type : <0x6c> <67> DW_AT_byte_size : 4 <68> DW_AT_bit_size : 8 <69> DW_AT_bit_offset : 16 <6a> DW_AT_data_member_location: 0 <snip> Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> --- btf_encoder.c | 29 ++---------- btf_loader.c | 4 +- dwarf_loader.c | 121 +++++++++++++++++++++++++++++++------------------ dwarves.c | 16 +++++-- 4 files changed, 95 insertions(+), 75 deletions(-) diff --git a/btf_encoder.c b/btf_encoder.c index d648c47..f85b978 100644 --- a/btf_encoder.c +++ b/btf_encoder.c @@ -67,31 +67,12 @@ static int32_t structure_type__encode(struct btf_elf *btfe, struct tag *tag, uin return type_id; type__for_each_data_member(type, pos) { - uint32_t bit_offset; - - /* calculate member bits_offset. - * - * for big endian or non-bitfield little endian, - * use pos->bit_offset computed by - * dwarf_loader which conforms to BTF requirement. - * - * for little endian bitfield member, if we have a field like - * pos->byte_size = 2, - * pos->bitfield_offset = 12, - * pos->bitfield_size = 2, - * This field occupy bits 12-13 by a 2-byte value, - * which corresponds to bits 2-3 from big endian - * perspective. + /* + * dwarf_loader uses DWARF's recommended bit offset addressing + * scheme, which conforms to BTF requirement, so no conversion + * is required. */ - if (btfe->is_big_endian || !pos->bitfield_size) - bit_offset = pos->bit_offset; - else - bit_offset = pos->byte_offset * 8 + - pos->byte_size * 8 - - pos->bitfield_offset - - pos->bitfield_size; - - if (btf_elf__add_member(btfe, pos->name, type_id_off + pos->tag.type, kind_flag, pos->bitfield_size, bit_offset)) + if (btf_elf__add_member(btfe, pos->name, type_id_off + pos->tag.type, kind_flag, pos->bitfield_size, pos->bit_offset)) return -1; } diff --git a/btf_loader.c b/btf_loader.c index 06fd037..f5e866d 100644 --- a/btf_loader.c +++ b/btf_loader.c @@ -480,12 +480,10 @@ static int class__fixup_btf_bitfields(struct tag *tag, struct cu *cu, struct btf /* bitfields seem to be always aligned, no matter the packing */ pos->byte_offset = pos->bit_offset / pos->bit_size * pos->bit_size / 8; pos->bitfield_offset = pos->bit_offset - pos->byte_offset * 8; - if (!btfe->is_big_endian) - pos->bitfield_offset = pos->bit_size - pos->bitfield_offset - pos->bitfield_size; /* re-adjust bitfield offset if it is negative */ if (pos->bitfield_offset < 0) { pos->bitfield_offset += pos->bit_size; - pos->byte_offset += pos->byte_size; + pos->byte_offset -= pos->byte_size; pos->bit_offset = pos->byte_offset * 8 + pos->bitfield_offset; } } else { diff --git a/dwarf_loader.c b/dwarf_loader.c index 888a054..f36fcf0 100644 --- a/dwarf_loader.c +++ b/dwarf_loader.c @@ -749,12 +749,21 @@ static struct class_member *class_member__new(Dwarf_Die *die, struct cu *cu, member->const_value = attr_numeric(die, DW_AT_const_value); member->byte_offset = attr_offset(die, DW_AT_data_member_location); /* - * Will be cached later, in class_member__cache_byte_size + * Bit offset calculated here is valid only for byte-aligned + * fields. For bitfields on little-endian archs we need to + * adjust them taking into account byte size of the field, + * which might not be yet known. So we'll re-calculate bit + * offset later, in class_member__cache_byte_size. */ - member->byte_size = 0; + member->bit_offset = member->byte_offset * 8; + /* + * If DW_AT_byte_size is not present, byte size will be + * determined later in class_member__cache_byte_size using + * base integer/enum type + */ + member->byte_size = attr_numeric(die, DW_AT_byte_size); member->bitfield_offset = attr_numeric(die, DW_AT_bit_offset); member->bitfield_size = attr_numeric(die, DW_AT_bit_size); - member->bit_offset = member->byte_offset * 8 + member->bitfield_offset; member->bit_hole = 0; member->bitfield_end = 0; member->visited = 0; @@ -2109,15 +2118,6 @@ static int die__process_and_recode(Dwarf_Die *die, struct cu *cu) return cu__recode_dwarf_types(cu); } -static void class_member__adjust_bitfield(struct class_member *member) -{ - if (member->bitfield_offset < 0) { - member->bitfield_offset += member->bit_size; - member->byte_offset += member->byte_size; - member->bit_offset = member->byte_offset * 8 + member->bitfield_offset; - } -} - static int class_member__cache_byte_size(struct tag *tag, struct cu *cu, void *cookie) { @@ -2131,49 +2131,80 @@ static int class_member__cache_byte_size(struct tag *tag, struct cu *cu, return 0; } - if (member->bitfield_size != 0) { - struct tag *type = tag__strip_typedefs_and_modifiers(&member->tag, cu); + if (member->bitfield_size == 0) { + member->byte_size = tag__size(tag, cu); + member->bit_size = member->byte_size * 8; + return 0; + } + /* + * Try to figure out byte size, if it's not directly provided in DWARF + */ + if (member->byte_size == 0) { + struct tag *type = tag__strip_typedefs_and_modifiers(&member->tag, cu); member->byte_size = tag__size(type, cu); if (member->byte_size == 0) { + int bit_size; if (tag__is_enumeration(type)) { - member->byte_size = (tag__type(type)->size + 7) / 8 * 8; + bit_size = tag__type(type)->size; } else { struct base_type *bt = tag__base_type(type); - int bit_size = bt->bit_size ? bt->bit_size : base_type__name_to_size(bt, cu); - member->byte_size = (bit_size + 7) / 8 * 8; + bit_size = bt->bit_size ? bt->bit_size : base_type__name_to_size(bt, cu); } + member->byte_size = (bit_size + 7) / 8 * 8; } - member->bit_size = member->byte_size * 8; + } + member->bit_size = member->byte_size * 8; - /* - * XXX: integral_bit_size can be zero if - * base_type__name_to_size doesn't know about the base_type - * name, so one has to add there when such base_type isn't - * found. pahole will put zero on the struct output so it - * should be easy to spot the name when such unlikely thing - * happens. - */ - if (member->byte_size == 0) { - member->bitfield_offset = 0; - return 0; - } + /* + * XXX: after all the attemps to determine byte size, we might still + * be unsuccessful, because base_type__name_to_size doesn't know about + * the base_type name, so one has to add there when such base_type + * isn't found. pahole will put zero on the struct output so it should + * be easy to spot the name when such unlikely thing happens. + */ + if (member->byte_size == 0) { + member->bitfield_offset = 0; + return 0; + } - /* make sure bitfield offset is non-negative */ - class_member__adjust_bitfield(member); - member->bitfield_offset -= (member->byte_offset % member->byte_size) * 8; - member->byte_offset = member->bit_offset / member->bit_size * member->bit_size / 8; - /* we might have shifted bitfield_offset, re-adjust */ - class_member__adjust_bitfield(member); - - if (conf_load && conf_load->fixup_silly_bitfields && - member->byte_size == 8 * member->bitfield_size) { - member->bitfield_size = 0; - member->bitfield_offset = 0; - } - } else { - member->byte_size = tag__size(tag, cu); - member->bit_size = member->byte_size * 8; + /* + * For little-endian architectures, DWARF data emitted by gcc/clang + * specifies bitfield offset as an offset from the highest-order bit + * of an underlying integral type (e.g., int) to a highest-order bit + * of a bitfield. E.g., for bitfield taking first 5 bits of int-backed + * bitfield, bit offset will be 27 (sizeof(int) - 0 offset - 5 bit + * size), which is very counter-intuitive and isn't a natural + * extension of byte offset, which on little-endian points to + * lowest-order byte. So here we re-adjust bitfield offset to be an + * offset from lowest-order bit of underlying integral type to + * a lowest-order bit of a bitfield. This makes bitfield offset + * a natural extension of byte offset for bitfields and is uniform + * with how big-endian bit offsets work. + */ + if (cu->little_endian) { + member->bitfield_offset = member->bit_size - member->bitfield_offset - member->bitfield_size; + } + member->bit_offset = member->byte_offset * 8 + member->bitfield_offset; + + /* make sure bitfield offset is non-negative */ + if (member->bitfield_offset < 0) { + member->bitfield_offset += member->bit_size; + member->byte_offset -= member->byte_size; + member->bit_offset = member->byte_offset * 8 + member->bitfield_offset; + } + /* align on underlying base type natural alignment boundary */ + member->bitfield_offset += (member->byte_offset % member->byte_size) * 8; + member->byte_offset = member->bit_offset / member->bit_size * member->bit_size / 8; + if (member->bitfield_offset >= member->bit_size) { + member->bitfield_offset -= member->bit_size; + member->byte_offset += member->byte_size; + } + + if (conf_load && conf_load->fixup_silly_bitfields && + member->byte_size == 8 * member->bitfield_size) { + member->bitfield_size = 0; + member->bitfield_offset = 0; } return 0; diff --git a/dwarves.c b/dwarves.c index 3a5e988..619dcb3 100644 --- a/dwarves.c +++ b/dwarves.c @@ -1205,11 +1205,10 @@ void class__find_holes(struct class *class) if (pos->is_static) continue; + bit_start = pos->bit_offset; if (pos->bitfield_size) { - bit_start = pos->byte_offset * 8 + pos->bit_size - pos->bitfield_offset - pos->bitfield_size; bit_end = bit_start + pos->bitfield_size; } else { - bit_start = pos->byte_offset * 8; bit_end = bit_start + pos->byte_size * 8; } @@ -1240,9 +1239,20 @@ void class__find_holes(struct class *class) in_bitfield = true; /* if it's a new bitfield set or same, but with * bigger-sized type, readjust size and end bit */ - if (pos->byte_offset * 8 >= cur_bitfield_end || pos->bit_size > cur_bitfield_size) { + if (bit_end > cur_bitfield_end || pos->bit_size > cur_bitfield_size) { cur_bitfield_size = pos->bit_size; cur_bitfield_end = pos->byte_offset * 8 + cur_bitfield_size; + /* + * if current bitfield "borrowed" bits from + * previous bitfield, it will have byte_offset + * of previous bitfield's backing integral + * type, but its end bit will be in a new + * bitfield "area", so we need to adjust + * bitfield end appropriately + */ + if (bit_end > cur_bitfield_end) { + cur_bitfield_end += cur_bitfield_size; + } } } else { in_bitfield = false; -- 2.17.1