> 2024年5月17日 08:30,Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> 写道: > > On Wed, 15 May 2024, Coly Li wrote: >> On Mon, May 13, 2024 at 10:15:00PM -0700, Robert Pang wrote: >>> Dear Coly, >>> >> >> Hi Robert, >> >> Thanks for the email. Let me explain inline. >> >>> Thank you for your dedication in reviewing this patch. I understand my >>> previous message may have come across as urgent, but I want to >>> emphasize the significance of this bcache operational issue as it has >>> been reported by multiple users. >>> >> >> What I concerned was still the testing itself. First of all, from the >> following information, I see quite a lot of testings are done. I do >> appreciate for the effort, which makes me confident for the quality of >> this patch. >> >>> We understand the importance of thoroughness, To that end, we have >>> conducted extensive, repeated testing on this patch across a range of >>> cache sizes (375G/750G/1.5T/3T/6T/9TB) and CPU cores >>> (2/4/8/16/32/48/64/80/96/128) for an hour-long run. We tested various >>> workloads (read-only, read-write, and write-only) with 8kB I/O size. >>> In addition, we did a series of 16-hour runs with 750GB cache and 16 >>> CPU cores. Our tests, primarily in writethrough mode, haven't revealed >>> any issues or deadlocks. >>> >> >> An hour-long run is not enough for bcache. Normally for stability prupose >> at least 12-36 hours continue I/O pressure is necessary. Before Linux >> v5.3 bcache will run into out-of-memory after 10 ~ 12 hours heavy randome >> write workload on the server hardware Lenovo sponsored me. > > FYI: > > We have been running the v2 patch in production on 5 different servers > containing a total of 8 bcache volumes since April 7th this year, applied > to 6.6.25 and later kernels. Some servers run 4k sector sizes, and others > run 512-byte sectors for the data volume. For the cache volumes, their all > cache devices use 512 byte sectors. > > The backing storage on these servers range from 40-350 terabytes, and the > cache sizes are in the 1-2 TB range. We log kernel messages with > netconsole into a centralized log server and have not had any bcache > issues. Thanks for the information. The issue I stated didn’t generate kernel message. It just causes all I/Os bypass the almost fully occupied cache even it is all clean data. Anyway this is not directly caused by this patch, this patch just makes it more easier to arrive such situation before I found and fixed it. And to all contributors (including Dongsheng, Mingzhe, Robert, Eric and others), At this moment I see it works fine on my server. I am about to submit it to Jens next week, if no other issue pops up. Thanks. Coly Li