On 2022/12/20 02:56, Theodore Ts'o wrote:
On Mon, Dec 19, 2022 at 09:39:33AM -0800, Darrick J. Wong wrote:
On Fri, Dec 16, 2022 at 02:51:21PM +0800, Qu Wenruo wrote:
When KEEP_DMESG is set to "yes", we will always save the dmesg of any
test case (no matter if it passed or not) into "$seqnum.dmesg".
But this KEEP_DMESG behavior doesn't affect xunit report.
This patch will make xunit report to follow KEEP_DMESG setting.
This may be dangerous; if the XML file is too large, the XML parser
may end up rejecting the whole XML file because otherwlse a too-large
XML file can trigger a denial of service attack[1]. (This is why I
implemented "xunit-quiet".)
I guess your concern is correct.
Although during my auto run, single dmesg are not that large, the
largest three are not MiB level yet:
108K btrfs/187.dmesg
80K btrfs/072.dmesg
64K generic/311.dmesg
But the total dmesg sizes already go MiB levels, 2.6MiB.
[1] https://gitlab.com/gitlab-org/gitlab/-/issues/25357
So if you are running a large number of tests (e.g., "-g auto") it
might very well that adding dmesg for all tests might very well end up
bloating the XML file to the point where it will be unmangeable. For
example, this is the size for my syslog file after running "-g auto"
on the "xfs/quota" config:
-rw-r----- 1 tytso primarygroup 10316684 Aug 25 10:35 ae/syslog
The syslog file for all of the xfs configs are 9-10 megabytes each.
If I combined the 12 xfs configs that we run into a single xunit JML
file with the dmesg output, this would be *guaranteed* to blow out
most XML parsers.
Personally, I find that a better solution is to use the syslog daemon
to save the dmesg output for all of the tests into a single file. I
prefer this for three reasons:
* The single file is more compressibls compared to having it broken
out into separate $seqnum.dmesg files.
* By keeping dmesg and other test artifacts separate from the xml
file I can archive the xml file for a much larger period of time,
(perhaps indefinitely) while allowing the much more volunumous
test artifacts to be archived for a shorter time (say, 3-6 months).
* When there are test isolation issues, it's not uncommon for a
previous test to fail with some kind of global or cgroup-specific
OOM-kill, or when I'm testing on bare metal with real hardware
where hardware failures is a Thing, being able to look for unusual
kernel messages before the start of a particular test can often be
quite revealing.
On the other hand, separating dmesg from its test cases means extra
parsing to bind the log section to certain test runs.
In fact, I'm not even sure if possible to do that.
E.g. on the same host with different kernel/fstests configs.
One reason I want to keep the dmesg bound to each test case is, we had
cases in the past that some btrfs specific error messages are not (and
should not) caught by fstests, but can still be some clue of bugs.
Thus if we have a proper paired dmesg with kernel/fstests configs and
test case, we can determine how affected the bug is.
Any better idea to relate them? Or is this really too niche?
Thanks,
Qu
Cheers,
- Ted