Hi Folks, First of all, thank you, Lennart, for your explanation. If I understand it correctly, it is a timing issue that occurs because the initiating process has already terminated and is unable to provide the information
needed by the logger. If this is the reason for the error, then delaying its termination should eliminate it. To test it, I have added a
usleep (10'000'000)command as the last instruction in test 4, but without success - the test still loses a message. So I have followed the advice of Ulrich to deliver my test sources, so you can experiment for yourself. My current test is stripped down to the relevant source code and attached as file
CompactJournalIntegrationTest.tar. The test contains file
CMakeLists.txt, which can be used to build up the entire test via
cmake. It is based on Google’s testing framework, which is fetched automatically from the Git repository.
Essentially there are 4 tests with increasing complexity:
The tests are named JournalIntegrationTest.T1
… JournalIntegrationTest.T4 and must be started with command
sudo ./JournalIntegrationTest. Add
–help to get further information. The following observations can be made now:
An incorrect test run looks like the following. developer@debianVM ..b/CompactJournaldIntegrationTest/build % sudo ./JournalIntegrationTest --gtest_filter="*.T4:*.T3:*.T2:*.T1" Running main() from /home/developer/scb/CompactJournaldIntegrationTest/build/_deps/googletest-src/googletest/src/gtest_main.cc Note: Google Test filter = *.T4:*.T3:*.T2:*.T1 [==========] Running 4 tests from 1 test suite. [----------] Global test environment set-up. [----------] 4 tests from JournalIntegrationTest [ RUN ] JournalIntegrationTest.T1 [ OK ] JournalIntegrationTest.T1 (87 ms) [ RUN ] JournalIntegrationTest.T2 [ OK ] JournalIntegrationTest.T2 (1850 ms) [ RUN ] JournalIntegrationTest.T3 [ OK ] JournalIntegrationTest.T3 (8090 ms) [ RUN ] JournalIntegrationTest.T4 /home/developer/scb/CompactJournaldIntegrationTest/Test.cpp:104: Failure Expected equality of these values: numberOfMessages Which is: 10000 foundCounter Which is: 9999 [ FAILED ] JournalIntegrationTest.T4 (16758 ms) [----------] 4 tests from JournalIntegrationTest (26785 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test suite ran. (26785 ms total) [ PASSED ] 3 tests. [ FAILED ] 1 test, listed below: [ FAILED ] JournalIntegrationTest.T4 1 FAILED TEST Via parameter –gtest_filter it is possible to select the tests that should run. Running the suite without test 1
will succeed. developer@debianVM ..b/CompactJournaldIntegrationTest/build % sudo ./JournalIntegrationTest --gtest_filter="*.T4:*.T3:*.T2" Running main() from /home/developer/scb/CompactJournaldIntegrationTest/build/_deps/googletest-src/googletest/src/gtest_main.cc Note: Google Test filter = *.T4:*.T3:*.T2 [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from JournalIntegrationTest [ RUN ] JournalIntegrationTest.T2 [ OK ] JournalIntegrationTest.T2 (1371 ms) [ RUN ] JournalIntegrationTest.T3 [ OK ] JournalIntegrationTest.T3 (8171 ms) [ RUN ] JournalIntegrationTest.T4 [ OK ] JournalIntegrationTest.T4 (17595 ms) [----------] 3 tests from JournalIntegrationTest (27137 ms total) [----------] Global test environment tear-down [==========] 3 tests from 1 test suite ran. (27137 ms total) [ PASSED ] 3 tests. Also a single run of T4 works. developer@debianVM ..b/CompactJournaldIntegrationTest/build % sudo ./JournalIntegrationTest --gtest_filter="*.T4"
Running main() from /home/developer/scb/CompactJournaldIntegrationTest/build/_deps/googletest-src/googletest/src/gtest_main.cc Note: Google Test filter = *.T4 [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from JournalIntegrationTest [ RUN ] JournalIntegrationTest.T4 [ OK ] JournalIntegrationTest.T4 (17343 ms) [----------] 1 test from JournalIntegrationTest (17344 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (17344 ms total) [ PASSED ] 1 test. So, do you have an idea what’s going on here? You will find the corresponding configuration attached below. Greetings from Berlin, Andreas [Journal] Storage=persistent Compress=yes Seal=yes SplitMode=none #SyncIntervalSec=5m #RateLimitIntervalSec=30s #RateLimitBurst=10000 SystemMaxUse=500M #SystemKeepFree= SystemMaxFileSize=50M SystemMaxFiles=13 #RuntimeMaxUse= #RuntimeKeepFree= #RuntimeMaxFileSize= #RuntimeMaxFiles=100 MaxRetentionSec=12month #MaxFileSec=1month #ForwardToSyslog=yes #ForwardToKMsg=no #ForwardToConsole=no #ForwardToWall=yes #TTYPath=/dev/console #MaxLevelStore=debug #MaxLevelSyslog=debug #MaxLevelKMsg=notice #MaxLevelConsole=info #MaxLevelWall=emerg #LineMax=48K #ReadKMsg=yes -----Ursprüngliche Nachricht----- *** This message is from an EXTERNAL SENDER - be CAUTIOUS with links and when opening attachments *** >>> Lennart Poettering <lennart@xxxxxxxxxxxxxx> schrieb am 06.07.2021 um
>>> 11:15 in Nachricht <YOQfRHFU4pNzn3qJ@gardel-login>: > On Mo, 05.07.21 18:46, Andreas Krueger (Andreas.Krueger@fmc‑ag.com) wrote: > >> Hi Folks, >> >> for a customer I have to verify the behavior of the logger in its
>> system (Linux debianVM 4.19.0‑6‑amd64 #1 SMP Debian >> 4.19.67‑2+deb10u1 (2019‑09‑20) x86_64 GNU/Linux), which is journald
>> (systemd 241 (241)). For this, I have written some unit tests that
>> work all well, when executed separately. But running together they
>> lead to some erroneous behavior that I cannot explain
‑ maybe you >> have an idea what's going wrong... > > I am not sure I follow entirely what you are doing. But please be
> aware of the following race we cannot fix without kernel changes: > > Whenever journald receieves a log message from a client it uses the
> SCM_CREDENTIALS metadata of the incoming packet to retrieve further
> medadata from /proc/$PID/ about the client. If the sender's process
> exited by the time the message is processed this data is not available
> anymore, and thus will not be stored along with the message, and can
> thus not be used to search/filter for the message. I don't know how such a client would be named then, but have you considered naming it "stale-PID-<PID>"? At least then the users could notice that something might not work as expected. While trying to lear I found that in my manual page for SYSTEMD-JOURNALD.SERVICE a page sd_journal_print(4) is referred to, but it dpoes not exist in my system (systemd-228-157.27.1.x86_64 of SLES12)... > > Thus: whenever you have a process that logs and immediately exits
> there's a chance that once journald processes that log message, it is
> seen and written to disk but without much of the metadata. If you then
> use "journalctl
‑u" or a similar command (e.g. the log output of > systemctl status) to look for the logs of the unit it will likely not
> be found since the unit name is one of the metadata fields not
> available in this case (since it is extracted from the > /proc/$PID/cgroup file by journald). > > You should see the log message if you do not use filtering though,
> i.e. "journalctl
‑e" or so should show it. > > We'd really like to fix this one day, but unfortunately kernel
> developers so far shot down any attempt to optionally attach more
> metadata to AF_UNIX datagrams (if we had just the cgroup this would
> already make things *so* much better for us). Regards, Ulrich > > Lennart > > ‑‑ > Lennart Poettering, Berlin > _______________________________________________ > systemd‑devel mailing list > systemd‑devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/systemd‑devel |
Attachment:
CompactJournaldIntegrationTest.tar
Description: CompactJournaldIntegrationTest.tar
_______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel