* David Miller (davem@xxxxxxxxxxxxx) wrote: > From: David Miller <davem@xxxxxxxxxxxxx> > Date: Tue, 18 Jan 2011 22:32:47 -0800 (PST) > > > As far as GCC can see, the object is static and also not part of an > > array or any other C construct for which things like this could matter > > as long as the alignment it chooses meets the minimum alignment > > requirements of the ABI. > > > > So it doesn't let us do this trick where we put the individual event > > markers into a special section, yet mark it __used and static, then > > later access them as if they were part of a globally visible array. > > > > If you look these static objects, they are emitted with a leading > > ".align 32" directive. This is what screws everything up. > > > > When the linker sees this, it aligns the start of every individual > > "_ftrace_events" section, and that's where the "gaps" come from and > > the crashes. > > I've come up with one possible way to deal with this. > > Put pointers to the trace objects into the special section, and > interate over those instead. > > I was wondering why this x86-64 weird alignment behavior doesn't bite > us for our init funcs. And the reason is that all of these weird > alignment cases only trigger for aggregates (ie. structs and unions). > > So we could do: > > static struct ftace_event_call foo_event = { ... }; > static struct ftrace_event_call * __used > __attribute__((section("_ftrace_event_ptrs"))) > foo_event_ptr = &foo_event; > > and > > extern struct ftrace_event_call *__start_ftrace_event_ptrs[]; > extern struct ftrace_event_call *__end_ftrace_event_ptrs[]; > > struct ftrace_event_call **p = __start_ftrace_event_ptrs; > while (p < &__end_ftrace_event_ptrs[0]) { > struct ftrace_event_call *event = *p++; > > __trace_add_event_call(event, ...); > } > > you get the idea. > > And we could mark this entire point section as "initdata" and thus > free'able after bootup and post module load. There are three downsides to this approach compared to forcing both the type and variable alignments with attributes: 1) It consumes extra space: This approach lets gcc align foo_event on 32-byte boundaries, which is unneeded in this case. For structures repeated very often, this is a bad thing. 2) It mixes .data and struct ftrace_event_call definitions, thus polluting .data cache-lines (actually, the 32-byte alignment will leave some of these cachelines partly unused). This would be fixable by specifying yet another section name to hold the struct ftace_event_call definitions. 3) Freeing _ftrace_event_ptrs is only possible after init here because Ftrace data layout is sub-optimal. The linked-list created at init time by Ftrace kind of duplicates the arrays already implicit to the section. If we look at Tracepoints for example, which present the exact same section alignment problem, we iterate on the tracepoint section each time a tracepoint is activated or deactivated. So we need to keep the section there after init. Therefore, your indirection approach would grow the data footprint. The trade-off here is that walking the table is a very rare operation that does not need to be fast, so we accept the performance hit of walking the tracepoint table for each enabled tracepoint to shrink the memory footprint. Especially for reasons (1) and (2), I'd be tempted to go for the __long_long_aligned type and variable attribute instead. Thoughts ? I'm still unsure that __long_long_aligned is needed over __long_aligned though. AFAIK, the only requirement we have for, e.g. tracepoints, is to align on the pointer size (sizeof(long)), so RCU pointer updates are performed atomically. Aligning on the pointer size also allows the architecture to efficiently read the field content. What does aligning on sizeof(long long) bring to us ? Is it that you are concerned about the fact that the "aligned" type attribute, when applied to a structure, is only used as a lower-bound by the compiler ? In that case, we might want to consider using "packed" too: #define __long_packed_aligned __attribute__((__packed__, __aligned__(__alignof__(long)))) I would just like to know if this attribute causes gcc to emit slower memory access instructions on ppc/sparc/mips (I remember that at least one of these emit slower instructions for unaligned read/writes, so I wonder if the compiler uses them as soon as it sees the "packed" attribute, or if it is more clever than that). Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html