> > Has this been tested on actual problematic hardware, and improved > performance? I remember last time this was discussed, the outcome was > not always what was expected. > > Christophe > I don't remember old discussions, there were actually 2 different ones which were confusing as one was referring an old embedded ARM version and another a more recent. I remember a memcpy version was proposed. I don't own an ARM to test. Was tested with x64 with no regressions introduced. Knowing the machine instruction, which one fault is causing the issue and the different platforms I can try and verify the assembly output cross compiling. Using an Ubuntu machine (looks like Fedora cross compiler is only for compiling the kernel) and a cross compiler I got 00004284 <parse_msg_disconnecting>: 4284: f100 020c add.w r2, r0, #12 4288: 428a cmp r2, r1 428a: d817 bhi.n 42bc <parse_msg_disconnecting+0x38> 428c: b5f8 push {r3, r4, r5, r6, r7, lr} 428e: 4604 mov r4, r0 4290: 2010 movs r0, #16 4292: 461d mov r5, r3 4294: f7ff fffe bl 0 <malloc> 4298: b170 cbz r0, 42b8 <parse_msg_disconnecting+0x34> 429a: 6827 ldr r7, [r4, #0] 429c: 2310 movs r3, #16 429e: 6866 ldr r6, [r4, #4] 42a0: f240 0200 movw r2, #0 42a4: 68a1 ldr r1, [r4, #8] 42a6: f2c0 0200 movt r2, #0 42aa: 602b str r3, [r5, #0] 42ac: 9b06 ldr r3, [sp, #24] 42ae: 6007 str r7, [r0, #0] 42b0: 6046 str r6, [r0, #4] 42b2: 6081 str r1, [r0, #8] 42b4: 601a str r2, [r3, #0] 42b6: bdf8 pop {r3, r4, r5, r6, r7, pc} 42b8: 2000 movs r0, #0 42ba: bdf8 pop {r3, r4, r5, r6, r7, pc} 42bc: 2000 movs r0, #0 42be: 4770 bx lr instead of 0000427c <parse_msg_disconnecting>: 427c: f100 020c add.w r2, r0, #12 4280: 428a cmp r2, r1 4282: d817 bhi.n 42b4 <parse_msg_disconnecting+0x38> 4284: b570 push {r4, r5, r6, lr} 4286: 4604 mov r4, r0 4288: 2010 movs r0, #16 428a: 461d mov r5, r3 428c: f7ff fffe bl 0 <malloc> 4290: b170 cbz r0, 42b0 <parse_msg_disconnecting+0x34> 4292: e9d4 2300 ldrd r2, r3, [r4] 4296: 2610 movs r6, #16 4298: 68a4 ldr r4, [r4, #8] 429a: f240 0100 movw r1, #0 429e: 602e str r6, [r5, #0] 42a0: f2c0 0100 movt r1, #0 42a4: e9c0 2300 strd r2, r3, [r0] 42a8: 9b04 ldr r3, [sp, #16] 42aa: 6084 str r4, [r0, #8] 42ac: 6019 str r1, [r3, #0] 42ae: bd70 pop {r4, r5, r6, pc} 42b0: 2000 movs r0, #0 42b2: bd70 pop {r4, r5, r6, pc} 42b4: 2000 movs r0, #0 42b6: 4770 bx lr The platform is gnueabihf. As you can see the first assembly is using only ldr/str while second one, assuming alignment of 64 bit, uses ldrd/strd. On this platform ldr/str can do unaligned access while ldrd/strd cannot (note that ldr is using 32 bit access while ldrd does 64 bit access) Frediano > On Mon, Sep 18, 2017 at 02:52:57AM -0400, Frediano Ziglio wrote: > > ping > > > > I think I sent a reply after the new path which could be confusing. > > > > > > > > Instead of assuming that the system can safely do unaligned access > > > to memory use packed structures to allow the compiler generate > > > best code possible. > > > A packed structure tells the compiler to not leave padding inside it > > > and that the structure can be unaligned so any field can be unaligned > > > having to generate proper access code based on architecture. > > > For instance ARM7 can use unaligned access but not for 64 bit > > > numbers (currently these accesses are emulated by Linux kernel > > > with obvious performance consequences). > > > > > > This changes the current methods from: > > > > > > #ifdef WORDS_BIGENDIAN > > > #define read_uint32(ptr) ((uint32_t)SPICE_BYTESWAP32(*((uint32_t > > > *)(ptr)))) > > > #define write_uint32(ptr, val) *(uint32_t *)(ptr) = > > > SPICE_BYTESWAP32((uint32_t)val) > > > #else > > > #define read_uint32(ptr) (*((uint32_t *)(ptr))) > > > #define write_uint32(ptr, val) (*((uint32_t *)(ptr))) = val > > > #endif > > > > > > to: > > > #include <spice/start-packed.h> > > > typedef struct SPICE_ATTR_PACKED { > > > uint32_t v; > > > } uint32_unaligned_t; > > > #include <spice/end-packed.h> > > > > > > #ifdef WORDS_BIGENDIAN > > > #define read_uint32(ptr) ((uint32_t)SPICE_BYTESWAP32(((uint32_unaligned_t > > > *)(ptr))->v)) > > > #define write_uint32(ptr, val) ((uint32_unaligned_t *)(ptr))->v = > > > SPICE_BYTESWAP32((uint32_t)val) > > > #else > > > #define read_uint32(ptr) (((uint32_unaligned_t *)(ptr))->v) > > > #define write_uint32(ptr, val) (((uint32_unaligned_t *)(ptr))->v) = val > > > #endif > > > > > > Signed-off-by: Frediano Ziglio <fziglio@xxxxxxxxxx> > > > --- > > > python_modules/demarshal.py | 33 +++++++++++++++++++++++---------- > > > 1 file changed, 23 insertions(+), 10 deletions(-) > > > > > > Changes since v1: > > > - improved commit message; > > > - added SPICE_ATTR_PACKED to structure definition for compatibility; > > > - use intX_unaligned_t instead of intX_unaligned_p. > > > > > > diff --git a/python_modules/demarshal.py b/python_modules/demarshal.py > > > index 1ea131d..da87d44 100644 > > > --- a/python_modules/demarshal.py > > > +++ b/python_modules/demarshal.py > > > @@ -40,24 +40,37 @@ def write_parser_helpers(writer): > > > > > > writer = writer.function_helper() > > > > > > + writer.writeln("#include <spice/start-packed.h>") > > > + for size in [16, 32, 64]: > > > + for sign in ["", "u"]: > > > + type = "%sint%d" % (sign, size) > > > + writer.begin_block("typedef struct SPICE_ATTR_PACKED") > > > + writer.variable_def("%s_t" % type, "v") > > > + writer.end_block(newline=False) > > > + writer.writeln(" %s_unaligned_t;" % type) > > > + writer.writeln("#include <spice/end-packed.h>") > > > + writer.newline() > > > + > > > + for sign in ["", "u"]: > > > + type = "%sint8" % sign > > > + writer.macro("read_%s" % type, "ptr", "(*((%s_t *)(ptr)))" % > > > type) > > > + writer.macro("write_%s" % type, "ptr, val", "*(%s_t *)(ptr) = > > > val" % > > > (type)) > > > + writer.newline() > > > + > > > writer.writeln("#ifdef WORDS_BIGENDIAN") > > > - for size in [8, 16, 32, 64]: > > > + for size in [16, 32, 64]: > > > for sign in ["", "u"]: > > > utype = "uint%d" % (size) > > > type = "%sint%d" % (sign, size) > > > swap = "SPICE_BYTESWAP%d" % size > > > - if size == 8: > > > - writer.macro("read_%s" % type, "ptr", "(*((%s_t > > > *)(ptr)))" % > > > type) > > > - writer.macro("write_%s" % type, "ptr, val", "*(%s_t > > > *)(ptr) > > > = val" % (type)) > > > - else: > > > - writer.macro("read_%s" % type, "ptr", "((%s_t)%s(*((%s_t > > > *)(ptr))))" % (type, swap, utype)) > > > - writer.macro("write_%s" % type, "ptr, val", "*(%s_t > > > *)(ptr) > > > = %s((%s_t)val)" % (utype, swap, utype)) > > > + writer.macro("read_%s" % type, "ptr", > > > "((%s_t)%s(((%s_unaligned_t *)(ptr))->v))" % (type, swap, utype)) > > > + writer.macro("write_%s" % type, "ptr, val", > > > "((%s_unaligned_t > > > *)(ptr))->v = %s((%s_t)val)" % (utype, swap, utype)) > > > writer.writeln("#else") > > > - for size in [8, 16, 32, 64]: > > > + for size in [16, 32, 64]: > > > for sign in ["", "u"]: > > > type = "%sint%d" % (sign, size) > > > - writer.macro("read_%s" % type, "ptr", "(*((%s_t *)(ptr)))" % > > > type) > > > - writer.macro("write_%s" % type, "ptr, val", "(*((%s_t > > > *)(ptr))) > > > = val" % type) > > > + writer.macro("read_%s" % type, "ptr", "(((%s_unaligned_t > > > *)(ptr))->v)" % type) > > > + writer.macro("write_%s" % type, "ptr, val", > > > "(((%s_unaligned_t > > > *)(ptr))->v) = val" % type) > > > writer.writeln("#endif") > > > > > > for size in [8, 16, 32, 64]: _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/spice-devel