Re: [PATCH spice-gtk v3 1/5] Avoid clang warnings on casts with stricter alignment requirements

Christophe de Dinechin <christophe@xxxxxxxxxxxx> · Wed, 17 May 2017 16:17:29 +0200

On 17 May 2017, at 12:08, Frediano Ziglio <fziglio@xxxxxxxxxx> wrote:



On 17 May 2017, at 09:44, Frediano Ziglio <fziglio@xxxxxxxxxx> wrote:


From: Christophe de Dinechin <dinechin@xxxxxxxxxx>

For example, something like this:
  uint8_t  *p8;
  uint32_t *p32 = (uint32_t *) p8;

generates a warning like this:
spice-channel.c:1350:10: error: cast from 'uint8_t *' (aka 'unsigned char
*') to
    'uint32_t *' (aka 'unsigned int *') increases required alignment from
    1
    to
    4 [-Werror,-Wcast-align]

The warning indicates that we end up with a pointer to data that
should be 4-byte aligned, but its value may be misaligned. On x86,
this does not make much of a difference, except a relatively minor
performance penalty. However, on platforms such as ARM, misaligned
accesses are emulated by the kernel, and support for them is optional.
So we may end up with a fault.

The intent of the fix here is to make it easy to identify and rework
places where actual mis-alignment occurs. Wherever casts raise the
warning,
they are replaced with a macro:

- SPICE_ALIGNED_CAST(type, value) casts value to type, and indicates that
we believe the resulting pointer is aligned. If it is not, a warning will
be issued.

- SPICE_UNALIGNED_CAST(type, value) casts value to type, and indicates
that
we believe the resulting pointer is not always aligned

Any code using SPICE_UNALIGNED_CAST may need to be revisited in order
to improve performance, e.g. by using memcpy.

There are normally no warnings for SPICE_UNALIGNED_CAST, but it is
possible
to emit debug messages for mis-alignment in SPICE_UNALIGNED_CAST.
Set environment variable SPICE_DEBUG_ALIGNMENT to activate this check.

Signed-off-by: Christophe de Dinechin <dinechin@xxxxxxxxxx>
---
spice-common                |  2 +-
src/channel-cursor.c        |  6 +++---
src/channel-display-mjpeg.c |  2 +-
src/decode-glz-tmpl.c       |  2 +-
src/spice-channel.c         | 14 +++++++++-----
5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/spice-common b/spice-common
index af682b1..1239c82 160000
--- a/spice-common
+++ b/spice-common
@@ -1 +1 @@
-Subproject commit af682b1b06dea55007d9aa7c37cd443e4349e43f
+Subproject commit 1239c82c54dee0244cca262cca0aa21071b23e24
diff --git a/src/channel-cursor.c b/src/channel-cursor.c
index 609243b..50de5ce 100644
--- a/src/channel-cursor.c
+++ b/src/channel-cursor.c
@@ -340,7 +340,7 @@ static display_cursor *set_cursor(SpiceChannel
*channel,
SpiceCursor *scursor)
       memcpy(cursor->data, data, size);
       for (i = 0; i < hdr->width * hdr->height; i++) {
           pix_mask = get_pix_mask(data, size, i);
-            if (pix_mask && *((guint32*)data + i) == 0xffffff) {
+            if (pix_mask && *(SPICE_ALIGNED_CAST(guint32*, data) + i) ==
0xffffff) {
               cursor->data[i] = get_pix_hack(i, hdr->width);
           } else {
               cursor->data[i] |= (pix_mask ? 0 : 0xff000000);
@@ -350,7 +350,7 @@ static display_cursor *set_cursor(SpiceChannel
*channel,
SpiceCursor *scursor)
   case SPICE_CURSOR_TYPE_COLOR16:
       for (i = 0; i < hdr->width * hdr->height; i++) {
           pix_mask = get_pix_mask(data, size, i);
-            pix = *((guint16*)data + i);
+            pix = *(SPICE_ALIGNED_CAST(guint16*,data) + i);
           if (pix_mask && pix == 0x7fff) {
               cursor->data[i] = get_pix_hack(i, hdr->width);
           } else {

If these 2 hits you are going to have a message for every single pixel
in the image, potentially millions.

No, you will get one message only when you enter the image.

   static const char *last_loc = NULL;
   if (loc != last_loc) {
       last_loc = loc;
       spice_log(SPICE_LOG_DOMAIN, SPICE_LOG_LEVEL_WARNING, loc,
       __FUNCTION__,
                 "Misaligned access at %p, alignment %u", p, sz);
   }

If it’s the same source code location, you get it once. But if you have two
such warnings, then yes, it may become verbose.

But if you get millions of calls in case of mis-alignment, it means that on
ARM, you get millions of faults into the kernel to correct the alignment,
which is surely a lot worse performance-wise than a call.

I remember someone (Pavel?) recently saying that Spice performance on ARM was
terrible. Something like this could easily explain it.


I'm still convinced, like in this case, that if we are sure that data
should be aligned no check and warning should be done.

Would you be OK with a check and warning under #ifndef NDEBUG (like an
assert).

I'm really sure no architecture on hearth can do check and print in
a single machine instruction.

This is not what I said. I said that the cost in the likely case was one
extra instruction.

Actually I don't know any architecture
where a single check like this can be done in a single instruction.

Reduced source code: https://pastebin.com/ibZEVThA.

Machine code for x86 with clang: https://pastebin.com/Vmtwpkhf

The test follows the call to malloc. It’s a single “testb” instruction
followed by a never-taken conditional jump to cold-code. I am not positive,
but I’m pretty sure all recent x86 architectures execute this combination in
a single cycle under nominal conditions. So two additional instructions in
the hot code path, one of them (the jump) normally never executed.


Yes, 1+1 == 2 so that's 2 instructions not counting the print.
The jump is always executed, just not taken.

Not many processors implements speculative execution so there's
a penalty on the jump preventing following instructions to be
executed.

There, I disagree. All recent CPUs have branch prediction, and in that case, we have a “statically not taken” branch if there is no history, and a strongly not taken if there is history.

Beside these quite paranoid consideration my point is simple
that in different cases we know this is SURELY to never happen
and I don't see the point of doing the check at all.



Our current platforms don't have much issues with unaligned
memory 

I actually checked that. It’s true. x86 seems to have zero penalty. It looks like recent ARM chips also support misaligned. I have old chips at home ;-)

but the compiler is currently helping porting spice-gtk
to platforms with more strongly alignment issues so I don't
think is wasted time to fix properly these problems to avoid
alignment issues in the possible future.

	     movl    $27, %edi
	     callq   _malloc
	  
	 ## A single instruction to test
	     testb   $7, %al
	  

did you notice the bug in the macro?
You are aligning on the pointer, not to the int type.

No, I had not noticed. Very good catch. So no-ack ;-)

Christophe


	 ## Fall through jump not taken, cost presumably 0 cycles
	     jne LBB2_1
	 LBB2_2:


Christophe


@@ -364,7 +364,7 @@ static display_cursor *set_cursor(SpiceChannel
*channel,
SpiceCursor *scursor)
       for (i = 0; i < hdr->width * hdr->height; i++) {
           pix_mask = get_pix_mask(data, size + (sizeof(uint32_t) << 4),
           i);
           int idx = (i & 1) ? (data[i >> 1] & 0x0f) : ((data[i >> 1] &
           0xf0) >> 4);
-            pix = *((uint32_t*)(data + size) + idx);
+            pix = *(SPICE_UNALIGNED_CAST(uint32_t*,(data + size)) + idx);
           if (pix_mask && pix == 0xffffff) {
               cursor->data[i] = get_pix_hack(i, hdr->width);
           } else {
diff --git a/src/channel-display-mjpeg.c b/src/channel-display-mjpeg.c
index 3ae9d21..17c0f4f 100644
--- a/src/channel-display-mjpeg.c
+++ b/src/channel-display-mjpeg.c
@@ -151,7 +151,7 @@ static gboolean mjpeg_decoder_decode_frame(gpointer
video_decoder)
#ifndef JCS_EXTENSIONS
       {
           uint8_t *s = lines[0];
-            uint32_t *d = (uint32_t *)s;
+            uint32_t *d = SPICE_ALIGNED_CAST(uint32_t *,s);

           if (back_compat) {
               for (unsigned int j = lines_read * width; j > 0; ) {
diff --git a/src/decode-glz-tmpl.c b/src/decode-glz-tmpl.c
index b337a8b..7695a28 100644
--- a/src/decode-glz-tmpl.c
+++ b/src/decode-glz-tmpl.c
@@ -178,7 +178,7 @@ static size_t FNAME(decode)(SpiceGlzDecoderWindow
*window,
                           uint64_t image_id, SpicePalette *plt)
{
   uint8_t      *ip = in_buf;
-    OUT_PIXEL    *out_pix_buf = (OUT_PIXEL *)out_buf;
+    OUT_PIXEL    *out_pix_buf = SPICE_ALIGNED_CAST(OUT_PIXEL *,out_buf);
   OUT_PIXEL    *op = out_pix_buf;
   OUT_PIXEL    *op_limit = out_pix_buf + size;


This is safe too. 16 byte and 32 byte images must be aligned and 24 and 8
bit
are surely aligned (to byte).

diff --git a/src/spice-channel.c b/src/spice-channel.c
index 3b6231e..88d0567 100644
--- a/src/spice-channel.c
+++ b/src/spice-channel.c
@@ -1312,6 +1312,7 @@ static void spice_channel_send_link(SpiceChannel
*channel)
{
   SpiceChannelPrivate *c = channel->priv;
   uint8_t *buffer, *p;
+    uint32_t *p32;
   int protocol, i;

   c->link_hdr.magic = SPICE_MAGIC;
@@ -1356,14 +1357,15 @@ static void spice_channel_send_link(SpiceChannel
*channel)
   memcpy(p, &c->link_hdr, sizeof(c->link_hdr)); p +=
   sizeof(c->link_hdr);
   memcpy(p, &c->link_msg, sizeof(c->link_msg)); p +=
   sizeof(c->link_msg);

+    // Need this to avoid error with -Wcast-align
+    p32 = SPICE_UNALIGNED_CAST(uint32_t *,p);
   for (i = 0; i < c->common_caps->len; i++) {
-        *(uint32_t *)p = GUINT32_TO_LE(g_array_index(c->common_caps,
uint32_t, i));
-        p += sizeof(uint32_t);
+        *p32++ = GUINT32_TO_LE(g_array_index(c->common_caps, uint32_t,
i));
   }
   for (i = 0; i < c->caps->len; i++) {
-        *(uint32_t *)p = GUINT32_TO_LE(g_array_index(c->caps, uint32_t,
i));
-        p += sizeof(uint32_t);
+        *p32++ = GUINT32_TO_LE(g_array_index(c->caps, uint32_t, i));
   }
+    p = (uint8_t *) p32;
   CHANNEL_DEBUG(channel, "channel type %d id %d num common caps %u num
   caps %u",
                 c->channel_type,
                 c->channel_id,
@@ -1887,6 +1889,7 @@ static gboolean
spice_channel_recv_link_msg(SpiceChannel *channel)
   SpiceChannelPrivate *c;
   int rc, num_caps, i;
   uint32_t *caps, num_channel_caps, num_common_caps;
+    uint8_t *caps_src;
   SpiceChannelEvent event = SPICE_CHANNEL_ERROR_LINK;

   g_return_val_if_fail(channel != NULL, FALSE);
@@ -1926,7 +1929,8 @@ static gboolean
spice_channel_recv_link_msg(SpiceChannel *channel)
   /* see original spice/client code: */
   /* g_return_if_fail(c->peer_msg + c->peer_msg->caps_offset *
   sizeof(uint32_t) > c->peer_msg + c->peer_hdr.size); */

-    caps = (uint32_t *)((uint8_t *)c->peer_msg +
GUINT32_FROM_LE(c->peer_msg->caps_offset));
+    caps_src = (uint8_t *)c->peer_msg +
GUINT32_FROM_LE(c->peer_msg->caps_offset);
+    caps = SPICE_UNALIGNED_CAST(uint32_t *, caps_src);

   g_array_set_size(c->remote_common_caps, num_common_caps);
   for (i = 0; i < num_common_caps; i++, caps++) {

Also you should merge style changes from 5/5.

_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel