Hi, This is a combined release announcement for the 2.11.0 - "Lafontaine" release of the LTTng Tools, LTTng UST, and LTTng modules projects. This release is named after a modern Saison beer from Montréal's Oshlag microbrewery. It is a refreshing, zesty, rice beer with hints of fruit and spices. Some even say it makes for a great Somaek when mixed with Chamisul Soju, not that we've tried! Lafontaine is also a tongue-in-cheek reference to a water leak that affected EfficiOS's offices during the development of this release. The most notable features of this new release are: - Session rotation, - Dynamic tracing of user-space (LTTng-modules), - Support for arrays and bitwise binary operators in filters, - User and kernel space call stack capture from the LTTng-modules kernel tracer (LTTng-modules), - Improved networking performance of the relay daemon, - Take NUMA configuration into account for UST buffer allocation (LTTng-UST), - Support unloading of probe providers (dlclose) (LTTng-UST). Important point: since we needed to change the layout of the LTTng-UST shared buffers in this release, we had to bump the ABI major version between LTTng-UST and lttng-sessiond from 7 to 8. Therefore, if incompatible lttng-ust and lttng-sessiond attempt to interact, lttng-ust will not perform any tracing. Make sure you upgrade both lttng-tools and lttng-ust. This release has been brewing for two years. Indeed, LTTng 2.10.0 was released in August 2017, a little more than two years ago. This means we even had the chance to publish talks and material presenting the features introduced in this release already. [1, 2, 3, 4, 5] Read on for a short description of each of these features. -- Session rotation -- The biggest feature of this release is the long-awaited session rotation support. Session rotations now allow you to rotate an ongoing tracing session much in the same way as you would rotate logs. The 'lttng rotate' command rotates the current trace chunk of the current tracing session. Once a rotation is completed, LTTng does not manage the trace chunk archive anymore: you can read it, modify it, move it, or remove it. Because a rotation causes the tracing session’s current sub-buffers to be flushed, trace chunk archives are never redundant, that is, they do not overlap over time, unlike snapshots. Once a rotation is complete, offline analyses can be performed on the resulting trace, much like in 'normal' mode. However, the big advantage is that this can be done without interrupting tracing, and without being limited to tools which implement the "live" protocol. Moreover, session rotations can be scheduled on a time or size basis. The following commands give an idea of how this functionality can be used: $ lttng enable-rotation --timer 30s A rotation schedule is set so that an automatic rotation occurs at least every 30 seconds. $ lttng enable-rotation --size 100M A the rotation schedule is set so that an automatic rotation occurs every time the total size of the flushed part of the current trace chunk is at least 100 MiB. Conversely, session rotation schedules can be deactivated using the 'lttng disable-rotation' command. Finally, it is also possible to use the Notification API to notify external tools when a rotation has been completed. This allows arbitrary actions to be taken on a trace chunk archive when a scheduled rotation is completed. Such actions can include compressing, analyzing, or transferring a trace chunk, for example. -- Dynamic tracing of user-space -- This release also marks the introduction of another feature we have been eager to introduce for a long time: dynamic tracing. As of this release, it is possible to dynamically instrument function entries of applications and shared libraries, as well as DTrace-style SDT probes. The feature relies on the Linux kernel's uprobe facilities. As such, the kernel tracer is needed to benefit from this functionality. Here is an example instrumenting the libc's 'malloc' function which will produce an event named 'libc_malloc' on every invocation of malloc() by any application linked on libc: $ lttng enable-event --kernel \ --userspace-probe /usr/lib/libc-2.28.so:malloc \ libc_malloc SDT probes are tracepoints distributed in multiple libraries and applications. This implementation supports tracing of SDT probes that are NOT guarded by a semaphore. The syntax to enable an SDT tracepoint userspace probe follows: $ lttng enable-event --kernel \ --userspace-probe=sdt:/path/to/executable:provider:probe Note that, for the time being, no payload is extracted (function arguments or SDT probe parameters). -- User and kernel space call stack capture from the LTTng-modules kernel tracer -- Two new contexts have been introduced for kernel channels: callstack-kernel and callstack-user. This allows the sampling of both the kernel and user space call stacks when a kernel event occurs. For example, it is now possible to capture both user and kernel call stacks whenever a process issues a 'read()' system call using the following commands: $ lttng create my_session $ lttng enable-event --kernel --syscall read $ lttng add-context --kernel --type callstack-kernel \ --type callstack-user $ lttng start Note that it is not always possible for the tracer to reliably sample the user space call stack. For instance, it is not possible to sample the call stack of user space code compiled with -fomit-frame-pointer. In such a case, the tracing is not affected, but the sampled user space call stack may only contain the top-most address of the user space call stack. -- Support for arrays and bitwise binary operators in filters -- The filtering expression language and interpreter have been extended to support new features: - Bracket and dot notations to get nested fields - New bitwise operators (>>, <<, &, ^, and |) These improvements make it possible to filter on the content of structures, arrays and sequences. The objective behind this work was to make it easier to filter events which contain binary payloads such as network headers. As such, it is now possible to use filter expressions of the form: "payload[2] & 0xff7 == 0x240" Also, the documentation of filter expressions has been expanded in the lttng-enable-event(1) man page to include more examples, a table of operator precedence and type promotion rules. -- Improved networking performance of the relay daemon -- While not a feature per se, a substantial amount of work went into mitigating performance degradations which were reported when a large number of targets streaming to a given relay daemon were suddenly experiencing connectivity issues. In those scenarios, the relay daemon could timeout on a number of network operations, making it unresponsive and unable to process incoming traffic for a long moment. In order to make it more resilient in those situations, the relay's communication with consumer daemons (i.e. traced targets) has been made entirely asynchronous, thus mitigating issues caused by misbehaving consumers. This is the first of many improvements to the relay daemon's networking code. Links --- Project website: https://lttng.org Documentation: https://lttng.org/docs Download link: https://lttng.org/download [1] FOSDEM 2019 - Fine-grained Distributed Application Monitoring Using LTTng https://www.efficios.com/blog/2019/03/01/fosdem19-distributed-monitoring/ [2] The new dynamic user space tracing feature in LTTng https://lttng.org/blog/2019/10/15/new-dynamic-user-space-tracing-in-lttng/ [3] LTTng session rotation: the practical way to do continuous tracing https://lttng.org/blog/2019/10/15/lttng-session-rotation/ [4] Open Source Summit Europe 2018 - Fine-grained Distributed Application Monitoring Using LTTng https://osseu18.sched.com/event/FxXw/fine-grained-distributed-application-monitoring-using-lttng-jeremie-galarneau-efficios [5] FOSDEM 2018 - The LTTng approaches to solving complex problems https://www.efficios.com/blog/2018/02/03/fosdem18-lttng-approaches/ On behalf of the EfficiOS team, Mathieu and Jérémie -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com