> Hrm, on such systems > - *large* amount of cpus > - no synchronized TSCs > > What would be the best approach to order events ? There isn't a perfect solution for this. My feeling is that your best hope is with per-cpu buffers logged with the local TSC ... together with some fancy heuristics to post-process the logs to come up with the best approximation to the actual ordering. If you have a tight upper bound estimate for the errors in converting from "per-cpu" TSC values to "global system time" then the post processing tool will be able to identify events for which the order is uncertain. > Do you think we should consider using HPET, event though it's > painfully slow ? Would it be faster than cache-line bouncing > on such large boxes ? With a frequency around 10MHz, that > would give a 100ns precision, which should be enough > to order events. This sounds like a poor choice. Makes all traces very slow. 100ns precision isn't all that good ... we can probably do almost as well estimating the delta between TSC on different cpus. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html