Hi, On Fri, 21 Feb 2014, Paul E. McKenney wrote: > > And with conservative I mean "everything is a source of a dependency, and > > hence can't be removed, reordered or otherwise fiddled with", and that > > includes code sequences where no atomic objects are anywhere in sight [1]. > > In the light of that the only realistic way (meaning to not have to > > disable optimization everywhere) to implement consume as currently > > specified is to map it to acquire. At which point it becomes pointless. > > No, only memory_order_consume loads and [[carries_dependency]] > function arguments are sources of dependency chains. I don't see [[carries_dependency]] in the C11 final draft (yeah, should get a real copy, I know, but let's assume it's the same language as the standard). Therefore, yes, only consume loads are sources of dependencies. The problem with the definition of the "carries a dependency" relation is not the sources, but rather where it stops. It's transitively closed over "value of evaluation A is used as operand in evaluation B", with very few exceptions as per 5.1.2.4#14. Evaluations can contain function calls, so if there's _any_ chance that an operand of an evaluation might even indirectly use something resulting from a consume load then that evaluation must be compiled in a way to not break dependency chains. I don't see a way to generally assume that e.g. the value of a function argument can impossibly result from a consume load, therefore the compiler must assume that all function arguments _can_ result from such loads, and so must disable all depchain breaking optimization (which are many). > > [1] Simple example of what type of transformations would be disallowed: > > > > int getzero (int i) { return i - i; } > > This needs to be as follows: > > [[carries_dependency]] int getzero(int i [[carries_dependency]]) > { > return i - i; > } > > Otherwise dependencies won't get carried through it. So, with the above do you agree that in absense of any other magic (see below) the compiler is not allowed to transform my initial getzero() (without the carries_dependency markers) implementation into "return 0;" because of the C11 rules for "carries-a-dependency"? If so, do you then also agree that the specification of "carries a dependency" is somewhat, err, shall we say, overbroad? > > depchains don't matter, could _then_ optmize it to zero. But that's > > insane, especially considering that it's hard to detect if a given context > > doesn't care for depchains, after all the depchain relation is constructed > > exactly so that it bleeds into nearly everywhere. So we would most of > > the time have to assume that the ultimate context will be depchain-aware > > and therefore disable many transformations. > > Any function that does not contain a memory_order_consume load and that > doesn't have any arguments marked [[carries_dependency]] can be > optimized just as before. And as such marker doesn't exist we must conservatively assume that it's on _all_ parameters, so I'll stand by my claim. > > Then inlining getzero would merely add another "# j.dep = i.dep" > > relation, so depchains are still there but the value optimization can > > happen before inlining. Having to do something like that I'd find > > disgusting, and rather rewrite consume into acquire :) Or make the > > depchain relation somehow realistically implementable. > > I was actually OK with arithmetic cancellation breaking the dependency > chains. Others on the committee felt otherwise, and I figured that (1) > I wouldn't be writing that kind of function anyway and (2) they knew > more about writing compilers than I. I would still be OK saying that > things like "i-i", "i*0", "i%1", "i&0", "i|~0" and so on just break the > dependency chain. Exactly. I can see the problem that people had with that, though. There are very many ways to write conceiled zeros (or generally neutral elements of the function in question). My getzero() function is one (it could e.g. be an assembler implementation). The allowance to break dependency chains would have to apply to such cancellation as well, and so can't simply itemize all cases in which cancellation is allowed. Rather it would have had to argue about something like "value dependency", ala "evaluation B depends on A, if there exist at least two different values A1 and A2 (results from A), for which evaluation B (with otherwise same operands) yields different values B1 and B2". Alas, it doesn't, except if you want to understand the term "the value of A is used as an operand of B" in that way. Even then you'd still have the second case of the depchain definition, via intermediate not even atomic memory stores and loads to make two evaluations be ordered per carries-a-dependency. And even that understanding of "is used" wouldn't be enough, because there are cases where the cancellation happens in steps, and where it interacts with the third clause (transitiveness): Assume this: a = something() // evaluation A b = 1 - a // evaluation B c = a - 1 + b // evaluation C Now, clearly B depends on A. Also C depends on B (because with otherwise same operands changing just B also changes C), because of transitiveness C then also depends on A. But equally cleary C was just an elaborate way to write "0", and so depends on nothing. The problem was of course that A and B weren't independent when determining the dependencies of C. But allowing cancellation to break dependency chains would have to allow for these cases as well. So, now, that leaves us basically with depchains forcing us to disable many useful transformation or finding some other magic. One would be to just regard all consume loads as acquire loads and be done (and effectively remove the ill-advised "carries a dependency" relation from consideration). You say downthread that it'd also be possible to just emit barriers before all function calls (I say "all" because the compiler will generally have applied some transformation that broke depchains if they existed). That seems to me to be a bigger hammer than just ignoring depchains and emit acquires instead of consumes (because the latter changes only exactly where atomics are used, the former seems to me to have unbounded effect). So, am still missing something or is my understanding of the carries-a-dependency relation correct and my conclusions are merely too pessimistic? Ciao, Michael. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html