Re: Antw: [EXT] Re: Failing UnitTest for Journald

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Folks,

 

First of all, thank you, Lennart, for your explanation. If I understand it correctly, it is a timing issue that occurs because the initiating process has already terminated and is unable to provide the information needed by the logger. If this is the reason for the error, then delaying its termination should eliminate it. To test it, I have added a usleep (10'000'000)command as the last instruction in test 4, but without success - the test still loses a message.

 

So I have followed the advice of Ulrich to deliver my test sources, so you can experiment for yourself. My current test is stripped down to the relevant source code and attached as file CompactJournalIntegrationTest.tar. The test contains file CMakeLists.txt, which can be used to build up the entire test via cmake. It is based on Google’s testing framework, which is fetched automatically from the Git repository.

 

Essentially there are 4 tests with increasing complexity:

  1. The first one just sends 5 small messages (~20 bytes) with priority Debug, Info, Warning, Error and Critical (the customer only needs these priorities). After that, the logger’s storage is synchronized and analyzed if all messages have been arrived correctly (via APIs sd_journal_...).
  2. The second test sends 10,000 small messages (~20 bytes) to the logger with priority Debug. After that, the logger’s storage is synchronized and analyzed if all messages have been arrived correctly (via APIs sd_journal_...).
  3. The third one sends 10,000 big messages (~10,000 bytes) to the logger with priority Debug, which are good to compress. After that, the logger’s storage is synchronized and analyzed if all messages have been arrived correctly (via APIs sd_journal_...).
  4. The last one sends 10,000 big messages (~10,000 bytes) to the logger with priority Debug, which are hard to compress. After that, the logger’s storage is synchronized and analyzed if all messages have been arrived correctly (via APIs sd_journal_...).

The tests are named JournalIntegrationTest.T1 JournalIntegrationTest.T4 and must be started with command sudo ./JournalIntegrationTest. Add –help to get further information.

 

The following observations can be made now:

  • If all tests are started separately, all is fine.
  • If test 3 + 4 are started together, all is fine.
  • If test 2 + 3 + 4 are started together, all is fine.
  • If test 1 + 2 + 3 + 4 are started together, test 4 has lost one message, which is always the last one.

 

An incorrect test run looks like the following.

 

developer@debianVM ..b/CompactJournaldIntegrationTest/build % sudo ./JournalIntegrationTest --gtest_filter="*.T4:*.T3:*.T2:*.T1"

Running main() from /home/developer/scb/CompactJournaldIntegrationTest/build/_deps/googletest-src/googletest/src/gtest_main.cc

Note: Google Test filter = *.T4:*.T3:*.T2:*.T1

[==========] Running 4 tests from 1 test suite.

[----------] Global test environment set-up.

[----------] 4 tests from JournalIntegrationTest

[ RUN      ] JournalIntegrationTest.T1

[       OK ] JournalIntegrationTest.T1 (87 ms)

[ RUN      ] JournalIntegrationTest.T2

[       OK ] JournalIntegrationTest.T2 (1850 ms)

[ RUN      ] JournalIntegrationTest.T3

[       OK ] JournalIntegrationTest.T3 (8090 ms)

[ RUN      ] JournalIntegrationTest.T4

/home/developer/scb/CompactJournaldIntegrationTest/Test.cpp:104: Failure

Expected equality of these values:

  numberOfMessages

    Which is: 10000

  foundCounter

    Which is: 9999

[  FAILED  ] JournalIntegrationTest.T4 (16758 ms)

[----------] 4 tests from JournalIntegrationTest (26785 ms total)

 

[----------] Global test environment tear-down

[==========] 4 tests from 1 test suite ran. (26785 ms total)

[  PASSED  ] 3 tests.

[  FAILED  ] 1 test, listed below:

[  FAILED  ] JournalIntegrationTest.T4

 

1 FAILED TEST

 

Via parameter –gtest_filter it is possible to select the tests that should run. Running the suite without test 1 will succeed.

 

developer@debianVM ..b/CompactJournaldIntegrationTest/build % sudo ./JournalIntegrationTest --gtest_filter="*.T4:*.T3:*.T2"

Running main() from /home/developer/scb/CompactJournaldIntegrationTest/build/_deps/googletest-src/googletest/src/gtest_main.cc

Note: Google Test filter = *.T4:*.T3:*.T2

[==========] Running 3 tests from 1 test suite.

[----------] Global test environment set-up.

[----------] 3 tests from JournalIntegrationTest

[ RUN      ] JournalIntegrationTest.T2

[       OK ] JournalIntegrationTest.T2 (1371 ms)

[ RUN      ] JournalIntegrationTest.T3

[       OK ] JournalIntegrationTest.T3 (8171 ms)

[ RUN      ] JournalIntegrationTest.T4

[       OK ] JournalIntegrationTest.T4 (17595 ms)

[----------] 3 tests from JournalIntegrationTest (27137 ms total)

 

[----------] Global test environment tear-down

[==========] 3 tests from 1 test suite ran. (27137 ms total)

[  PASSED  ] 3 tests.

 

Also a single run of T4 works.

 

developer@debianVM ..b/CompactJournaldIntegrationTest/build % sudo ./JournalIntegrationTest --gtest_filter="*.T4" 

Running main() from /home/developer/scb/CompactJournaldIntegrationTest/build/_deps/googletest-src/googletest/src/gtest_main.cc

Note: Google Test filter = *.T4

[==========] Running 1 test from 1 test suite.

[----------] Global test environment set-up.

[----------] 1 test from JournalIntegrationTest

[ RUN      ] JournalIntegrationTest.T4

[       OK ] JournalIntegrationTest.T4 (17343 ms)

[----------] 1 test from JournalIntegrationTest (17344 ms total)

 

[----------] Global test environment tear-down

[==========] 1 test from 1 test suite ran. (17344 ms total)

[  PASSED  ] 1 test.

 

So, do you have an idea what’s going on here?

 

You will find the corresponding configuration attached below.

 

Greetings from Berlin,

Andreas

 

 

 

 

[Journal]

Storage=persistent

Compress=yes

Seal=yes

SplitMode=none

#SyncIntervalSec=5m

#RateLimitIntervalSec=30s

#RateLimitBurst=10000

SystemMaxUse=500M

#SystemKeepFree=

SystemMaxFileSize=50M

SystemMaxFiles=13

#RuntimeMaxUse=

#RuntimeKeepFree=

#RuntimeMaxFileSize=

#RuntimeMaxFiles=100

MaxRetentionSec=12month

#MaxFileSec=1month

#ForwardToSyslog=yes

#ForwardToKMsg=no

#ForwardToConsole=no

#ForwardToWall=yes

#TTYPath=/dev/console

#MaxLevelStore=debug

#MaxLevelSyslog=debug

#MaxLevelKMsg=notice

#MaxLevelConsole=info

#MaxLevelWall=emerg

#LineMax=48K

#ReadKMsg=yes

 

 

 

-----Ursprüngliche Nachricht-----
Von: Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
Gesendet: Dienstag, 6. Juli 2021 14:16
An: Andreas Krueger <Andreas.Krueger@xxxxxxxxxx>; Lennart Poettering <lennart@xxxxxxxxxxxxxx>
Cc: Georg Brand <Georg.Brand@xxxxxxxxxx>; Thomas Zimmermann <thomas.zimmermann@xxxxxxxxxx>; systemd-devel@xxxxxxxxxxxxxxxxxxxxx
Betreff: Antw: [EXT] Re: [systemd-devel] Failing UnitTest for Journald

 

*** This message is from an EXTERNAL SENDER - be CAUTIOUS with links and when opening attachments ***

 

 

>>> Lennart Poettering <lennart@xxxxxxxxxxxxxx> schrieb am 06.07.2021 um

>>> 11:15

in

Nachricht <YOQfRHFU4pNzn3qJ@gardel-login>:

> On Mo, 05.07.21 18:46, Andreas Krueger (Andreas.Krueger@fmcag.com) wrote:

> 

>> Hi Folks,

>> 

>> for a customer I have to verify the behavior of the logger in its

>> system (Linux debianVM 4.19.06amd64 #1 SMP Debian

>> 4.19.672+deb10u1 (20190920) x86_64 GNU/Linux), which is journald

>> (systemd 241 (241)).  For this, I have written some unit tests that

>> work all well, when executed separately. But running together they

>> lead to some erroneous behavior that I cannot explain maybe you

>> have an idea what's going wrong...

> 

> I am not sure I follow entirely what you are doing. But please be

> aware of the following race we cannot fix without kernel changes:

> 

> Whenever journald receieves a log message from a client it uses the

> SCM_CREDENTIALS metadata of the incoming packet to retrieve further

> medadata from /proc/$PID/ about the client. If the sender's process

> exited by the time the message is processed this data is not available

> anymore, and thus will not be stored along with the message, and can

> thus not be used to search/filter for the message.

 

I don't know how such a client would be named then, but have you considered naming it "stale-PID-<PID>"?

At least then the users could notice that something might not work as expected.

 

While trying to lear I found that in my manual page for SYSTEMD-JOURNALD.SERVICE a page sd_journal_print(4) is referred to, but it dpoes not exist in my system (systemd-228-157.27.1.x86_64 of SLES12)...

 

> 

> Thus: whenever you have a process that logs and immediately exits

> there's a chance that once journald processes that log message, it is

> seen and written to disk but without much of the metadata. If you then

> use "journalctl u" or a similar command (e.g. the log output of

> systemctl status) to look for the logs of the unit it will likely not

> be found since the unit name is one of the metadata fields not

> available in this case (since it is extracted from the

> /proc/$PID/cgroup file by journald).

> 

> You should see the log message if you do not use filtering though,

> i.e. "journalctl e" or so should show it.

> 

> We'd really like to fix this one day, but unfortunately kernel

> developers so far shot down any attempt to optionally attach more

> metadata to AF_UNIX datagrams (if we had just the cgroup this would

> already make things *so* much better for us).

 

Regards,

Ulrich

 

> 

> Lennart

> 

> ‑‑

> Lennart Poettering, Berlin

> _______________________________________________

> systemddevel mailing list

> systemddevel@xxxxxxxxxxxxxxxxxxxxx

> https://lists.freedesktop.org/mailman/listinfo/systemddevel

 

 

 

Attachment: CompactJournaldIntegrationTest.tar
Description: CompactJournaldIntegrationTest.tar

_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux