Telemetry crashes integration with Redmine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

tl;dr:  We wish to open / update issues in tracker.ceph.com for each crash signature received via telemetry. There are ~2.5K signatures. We wish to do it in a way which makes sense to developers. Please share your suggestions.

Users who have opted-in to telemetry, and specifically its ‘crash’ channel, send daily anonymized information about the crashes that occurred within their clusters. This information includes the crashed daemon name, its version, the backtrace, the crash’s signature (a fingerprint which represents similar crash events), the assert function and condition (if applicable), etc.

Our goal is to make these telemetry crash reports available and actionable to developers, and to be able to track their statuses. For this we need to have an associated Redmine issue for each crash signature.

Currently there are ~2,500 signatures that should be tracked. An integration bot [1] can open / update corresponding Redmine issues [2, 3], but we wish not to overwhelm developers with a massive amount of new issues all at once.

In the CLT meeting Ilya suggested having a crash count threshold, so we only open issues for signatures with at least 2 crash events; or even to combine this with the number of clusters affected by the crash signature. Neha suggested that we include signatures of recent releases, regardless of the number of clusters affected by them.

There are about ~1,400 signatures with only one crash event so far. See [4] for breakdown by version.
This leaves us with ~1,100 signatures (plus 61 of version 15.2.13 and 6 of 16.2.5, plus future signatures). Should we handle X of them every week? For instance, open 100 new issues per week? Should we prioritize these by versions and number of clusters affected? What cadence would make the most sense, bug-scrub-wise?

We will discuss this topic on our next CDM, please join.

Thanks!
Yaarit


[1] https://pad.ceph.com/p/telemetry-redmine-bot
[2] https://tracker.ceph.com/issues/51756
[3] https://tracker.ceph.com/issues/49666
[4] Count of signatures with a single crash event, by version:
{15.2.8} 319
{15.2.5} 159
{16.2.4} 148
{15.2.7} 146
{15.2.9} 122
{15.2.4} 115
{15.2.10} 79
{15.2.13} 61
{15.2.11} 39
{16.2.0} 36
{15.2.6} 30
{16.2.1} 28
{15.2.3} 26
{15.2.1} 26
{16.2.3} 12
{15.2.12} 9
{15.2.0} 9
{15.1.0} 6
{16.2.5} 6
{15.2.2} 5
{15.0.0} 4
{16.1.0} 2
{16.0.0} 2
{16.2.2} 1

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux