On 17 May 2017, at 09:44, Frediano Ziglio <fziglio@xxxxxxxxxx> wrote:
From: Christophe de Dinechin <dinechin@xxxxxxxxxx>
For example, something like this: uint8_t *p8; uint32_t *p32 = (uint32_t *) p8;
generates a warning like this: spice-channel.c:1350:10: error: cast from 'uint8_t *' (aka 'unsigned char *') to 'uint32_t *' (aka 'unsigned int *') increases required alignment from 1 to 4 [-Werror,-Wcast-align]
The warning indicates that we end up with a pointer to data that should be 4-byte aligned, but its value may be misaligned. On x86, this does not make much of a difference, except a relatively minor performance penalty. However, on platforms such as ARM, misaligned accesses are emulated by the kernel, and support for them is optional. So we may end up with a fault.
The intent of the fix here is to make it easy to identify and rework places where actual mis-alignment occurs. Wherever casts raise the warning, they are replaced with a macro:
- SPICE_ALIGNED_CAST(type, value) casts value to type, and indicates that we believe the resulting pointer is aligned. If it is not, a warning will be issued.
- SPICE_UNALIGNED_CAST(type, value) casts value to type, and indicates that we believe the resulting pointer is not always aligned
Any code using SPICE_UNALIGNED_CAST may need to be revisited in order to improve performance, e.g. by using memcpy.
There are normally no warnings for SPICE_UNALIGNED_CAST, but it is possible to emit debug messages for mis-alignment in SPICE_UNALIGNED_CAST. Set environment variable SPICE_DEBUG_ALIGNMENT to activate this check.
Signed-off-by: Christophe de Dinechin <dinechin@xxxxxxxxxx> --- spice-common | 2 +- src/channel-cursor.c | 6 +++--- src/channel-display-mjpeg.c | 2 +- src/decode-glz-tmpl.c | 2 +- src/spice-channel.c | 14 +++++++++----- 5 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/spice-common b/spice-common index af682b1..1239c82 160000 --- a/spice-common +++ b/spice-common @@ -1 +1 @@ -Subproject commit af682b1b06dea55007d9aa7c37cd443e4349e43f +Subproject commit 1239c82c54dee0244cca262cca0aa21071b23e24 diff --git a/src/channel-cursor.c b/src/channel-cursor.c index 609243b..50de5ce 100644 --- a/src/channel-cursor.c +++ b/src/channel-cursor.c @@ -340,7 +340,7 @@ static display_cursor *set_cursor(SpiceChannel *channel, SpiceCursor *scursor) memcpy(cursor->data, data, size); for (i = 0; i < hdr->width * hdr->height; i++) { pix_mask = get_pix_mask(data, size, i); - if (pix_mask && *((guint32*)data + i) == 0xffffff) { + if (pix_mask && *(SPICE_ALIGNED_CAST(guint32*, data) + i) == 0xffffff) { cursor->data[i] = get_pix_hack(i, hdr->width); } else { cursor->data[i] |= (pix_mask ? 0 : 0xff000000); @@ -350,7 +350,7 @@ static display_cursor *set_cursor(SpiceChannel *channel, SpiceCursor *scursor) case SPICE_CURSOR_TYPE_COLOR16: for (i = 0; i < hdr->width * hdr->height; i++) { pix_mask = get_pix_mask(data, size, i); - pix = *((guint16*)data + i); + pix = *(SPICE_ALIGNED_CAST(guint16*,data) + i); if (pix_mask && pix == 0x7fff) { cursor->data[i] = get_pix_hack(i, hdr->width); } else {
If these 2 hits you are going to have a message for every single pixel in the image, potentially millions.
No, you will get one message only when you enter the image.
static const char *last_loc = NULL; if (loc != last_loc) { last_loc = loc; spice_log(SPICE_LOG_DOMAIN, SPICE_LOG_LEVEL_WARNING, loc, __FUNCTION__, "Misaligned access at %p, alignment %u", p, sz); }
If it’s the same source code location, you get it once. But if you have two such warnings, then yes, it may become verbose.
But if you get millions of calls in case of mis-alignment, it means that on ARM, you get millions of faults into the kernel to correct the alignment, which is surely a lot worse performance-wise than a call.
I remember someone (Pavel?) recently saying that Spice performance on ARM was terrible. Something like this could easily explain it.
I'm still convinced, like in this case, that if we are sure that data should be aligned no check and warning should be done.
Would you be OK with a check and warning under #ifndef NDEBUG (like an assert).
I'm really sure no architecture on hearth can do check and print in a single machine instruction.
This is not what I said. I said that the cost in the likely case was one extra instruction.
Actually I don't know any architecture where a single check like this can be done in a single instruction.
Reduced source code: https://pastebin.com/ibZEVThA.
Machine code for x86 with clang: https://pastebin.com/Vmtwpkhf
The test follows the call to malloc. It’s a single “testb” instruction followed by a never-taken conditional jump to cold-code. I am not positive, but I’m pretty sure all recent x86 architectures execute this combination in a single cycle under nominal conditions. So two additional instructions in the hot code path, one of them (the jump) normally never executed.
Yes, 1+1 == 2 so that's 2 instructions not counting the print.The jump is always executed, just not taken.
Not many processors implements speculative execution so there's a penalty on the jump preventing following instructions to be executed.
There, I disagree. All recent CPUs have branch prediction, and in that case, we have a “statically not taken” branch if there is no history, and a strongly not taken if there is history. Beside these quite paranoid consideration my point is simple that in different cases we know this is SURELY to never happen and I don't see the point of doing the check at all.
Our current platforms don't have much issues with unaligned memory
I actually checked that. It’s true. x86 seems to have zero penalty. It looks like recent ARM chips also support misaligned. I have old chips at home ;-) but the compiler is currently helping porting spice-gtkto platforms with more strongly alignment issues so I don'tthink is wasted time to fix properly these problems to avoidalignment issues in the possible future. movl $27, %edi callq _malloc ## A single instruction to test testb $7, %al
did you notice the bug in the macro?You are aligning on the pointer, not to the int type.
No, I had not noticed. Very good catch. So no-ack ;-)
Christophe ## Fall through jump not taken, cost presumably 0 cycles jne LBB2_1 LBB2_2:
Christophe
@@ -364,7 +364,7 @@ static display_cursor *set_cursor(SpiceChannel *channel, SpiceCursor *scursor) for (i = 0; i < hdr->width * hdr->height; i++) { pix_mask = get_pix_mask(data, size + (sizeof(uint32_t) << 4), i); int idx = (i & 1) ? (data[i >> 1] & 0x0f) : ((data[i >> 1] & 0xf0) >> 4); - pix = *((uint32_t*)(data + size) + idx); + pix = *(SPICE_UNALIGNED_CAST(uint32_t*,(data + size)) + idx); if (pix_mask && pix == 0xffffff) { cursor->data[i] = get_pix_hack(i, hdr->width); } else { diff --git a/src/channel-display-mjpeg.c b/src/channel-display-mjpeg.c index 3ae9d21..17c0f4f 100644 --- a/src/channel-display-mjpeg.c +++ b/src/channel-display-mjpeg.c @@ -151,7 +151,7 @@ static gboolean mjpeg_decoder_decode_frame(gpointer video_decoder) #ifndef JCS_EXTENSIONS { uint8_t *s = lines[0]; - uint32_t *d = (uint32_t *)s; + uint32_t *d = SPICE_ALIGNED_CAST(uint32_t *,s);
if (back_compat) { for (unsigned int j = lines_read * width; j > 0; ) { diff --git a/src/decode-glz-tmpl.c b/src/decode-glz-tmpl.c index b337a8b..7695a28 100644 --- a/src/decode-glz-tmpl.c +++ b/src/decode-glz-tmpl.c @@ -178,7 +178,7 @@ static size_t FNAME(decode)(SpiceGlzDecoderWindow *window, uint64_t image_id, SpicePalette *plt) { uint8_t *ip = in_buf; - OUT_PIXEL *out_pix_buf = (OUT_PIXEL *)out_buf; + OUT_PIXEL *out_pix_buf = SPICE_ALIGNED_CAST(OUT_PIXEL *,out_buf); OUT_PIXEL *op = out_pix_buf; OUT_PIXEL *op_limit = out_pix_buf + size;
This is safe too. 16 byte and 32 byte images must be aligned and 24 and 8 bit are surely aligned (to byte).
diff --git a/src/spice-channel.c b/src/spice-channel.c index 3b6231e..88d0567 100644 --- a/src/spice-channel.c +++ b/src/spice-channel.c @@ -1312,6 +1312,7 @@ static void spice_channel_send_link(SpiceChannel *channel) { SpiceChannelPrivate *c = channel->priv; uint8_t *buffer, *p; + uint32_t *p32; int protocol, i;
c->link_hdr.magic = SPICE_MAGIC; @@ -1356,14 +1357,15 @@ static void spice_channel_send_link(SpiceChannel *channel) memcpy(p, &c->link_hdr, sizeof(c->link_hdr)); p += sizeof(c->link_hdr); memcpy(p, &c->link_msg, sizeof(c->link_msg)); p += sizeof(c->link_msg);
+ // Need this to avoid error with -Wcast-align + p32 = SPICE_UNALIGNED_CAST(uint32_t *,p); for (i = 0; i < c->common_caps->len; i++) { - *(uint32_t *)p = GUINT32_TO_LE(g_array_index(c->common_caps, uint32_t, i)); - p += sizeof(uint32_t); + *p32++ = GUINT32_TO_LE(g_array_index(c->common_caps, uint32_t, i)); } for (i = 0; i < c->caps->len; i++) { - *(uint32_t *)p = GUINT32_TO_LE(g_array_index(c->caps, uint32_t, i)); - p += sizeof(uint32_t); + *p32++ = GUINT32_TO_LE(g_array_index(c->caps, uint32_t, i)); } + p = (uint8_t *) p32; CHANNEL_DEBUG(channel, "channel type %d id %d num common caps %u num caps %u", c->channel_type, c->channel_id, @@ -1887,6 +1889,7 @@ static gboolean spice_channel_recv_link_msg(SpiceChannel *channel) SpiceChannelPrivate *c; int rc, num_caps, i; uint32_t *caps, num_channel_caps, num_common_caps; + uint8_t *caps_src; SpiceChannelEvent event = SPICE_CHANNEL_ERROR_LINK;
g_return_val_if_fail(channel != NULL, FALSE); @@ -1926,7 +1929,8 @@ static gboolean spice_channel_recv_link_msg(SpiceChannel *channel) /* see original spice/client code: */ /* g_return_if_fail(c->peer_msg + c->peer_msg->caps_offset * sizeof(uint32_t) > c->peer_msg + c->peer_hdr.size); */
- caps = (uint32_t *)((uint8_t *)c->peer_msg + GUINT32_FROM_LE(c->peer_msg->caps_offset)); + caps_src = (uint8_t *)c->peer_msg + GUINT32_FROM_LE(c->peer_msg->caps_offset); + caps = SPICE_UNALIGNED_CAST(uint32_t *, caps_src);
g_array_set_size(c->remote_common_caps, num_common_caps); for (i = 0; i < num_common_caps; i++, caps++) {
Also you should merge style changes from 5/5.
|