Hi, 在 2024/07/12 20:11, Konstantin Kharlamov 写道:
Good news: you diff seems to have fixed the problem! I would have to test more extensively in another environment to be completely sure, but by following the minimal steps-to-reproduce I can no longer reproduce the problem, so it seems to have fixed the problem.
That's good. :)
Bad news: there's a new lockup now 😄 This one seems to happen after the disk is returned back; unless the action of returning back matches accidentally the appearing stacktraces, which still might be possible even though I re-tested multiple times. It's because the traces (below) seems not to always appear. However, even when traces do not appear, IO load on the fio that's running in the background drops to zero, so something seems definitely wrong.
Ok, I need to investigate more for this. The call stack is not much helpful. At first, can the problem reporduce with raid1/raid10? If not, this is probably a raid5 bug. The best will be that if I can reporduce this problem myself. The problem is that I don't understand the step 4: turning off jbod slot's power, is this only possible for a real machine, or can I do this in my VM? Thanks, Kuai