čt 4. 4. 2024 v 15:37 odesílatel Jakub Kicinski <kuba@xxxxxxxxxx> napsal: > > On Thu, 4 Apr 2024 07:42:45 +0200 Jaroslav Pulchart wrote: > > We do not have much progress > > Random thought - do you have KFENCE enabled? > It's sufficiently low overhead to run in production and maybe it could > help catch the bug? You also hit some inexplicable bug in the Intel > driver, IIRC, there may be something odd going on.. (it's not all > happening on a single machine, right?) We have KFENCE enabled. Issue was observed at multiple servers. It is not a problem to reproduce it everywhere where we deploy Loki service. The trigger is: I click once/twice "run query" (LogQL) button by Grafana UI. the Loki is starting to load data from the minio cluster at a speed of ~2GB/s and almost immediately it crashes. The Intel ICE driver is in my suspicion as well, it will not be for the first time when we are hitting some bugs there. I will try one testing server where we have different NIC vendor later.