čt 4. 4. 2024 v 20:17 odesílatel Jaroslav Pulchart <jaroslav.pulchart@xxxxxxxxxxxx> napsal: > > čt 4. 4. 2024 v 15:37 odesílatel Jakub Kicinski <kuba@xxxxxxxxxx> napsal: > > > > On Thu, 4 Apr 2024 07:42:45 +0200 Jaroslav Pulchart wrote: > > > We do not have much progress > > > > Random thought - do you have KFENCE enabled? > > It's sufficiently low overhead to run in production and maybe it could > > help catch the bug? You also hit some inexplicable bug in the Intel > > driver, IIRC, there may be something odd going on.. (it's not all > > happening on a single machine, right?) > > We have KFENCE enabled. > > Issue was observed at multiple servers. It is not a problem to reproduce it > everywhere where we deploy Loki service. The trigger is: I click > once/twice "run query" (LogQL) button by Grafana UI. the Loki is > starting to load data from the minio cluster at a speed of ~2GB/s and > almost immediately it crashes. > > The Intel ICE driver is in my suspicion as well, it will not be for > the first time when we are hitting some bugs there. I will try one > testing server where we have different NIC vendor later. I run the setup on a server with a different network card than E810, I used BCM57414 NetXtreme-E + driver bnxt_en. The issue is not reproducible there. So it looks to be connected with Intel's ice driver for E810 network card and introduced in 6.3.