On Sat, 18 May 2024, Coly Li wrote: > > > > 2024年5月17日 08:30,Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> 写道: > > > > On Wed, 15 May 2024, Coly Li wrote: > >> On Mon, May 13, 2024 at 10:15:00PM -0700, Robert Pang wrote: > >>> Dear Coly, > >>> > >> > >> Hi Robert, > >> > >> Thanks for the email. Let me explain inline. > >> > >>> Thank you for your dedication in reviewing this patch. I understand my > >>> previous message may have come across as urgent, but I want to > >>> emphasize the significance of this bcache operational issue as it has > >>> been reported by multiple users. > >>> > >> > >> What I concerned was still the testing itself. First of all, from the > >> following information, I see quite a lot of testings are done. I do > >> appreciate for the effort, which makes me confident for the quality of > >> this patch. > >> > >>> We understand the importance of thoroughness, To that end, we have > >>> conducted extensive, repeated testing on this patch across a range of > >>> cache sizes (375G/750G/1.5T/3T/6T/9TB) and CPU cores > >>> (2/4/8/16/32/48/64/80/96/128) for an hour-long run. We tested various > >>> workloads (read-only, read-write, and write-only) with 8kB I/O size. > >>> In addition, we did a series of 16-hour runs with 750GB cache and 16 > >>> CPU cores. Our tests, primarily in writethrough mode, haven't revealed > >>> any issues or deadlocks. > >>> > >> > >> An hour-long run is not enough for bcache. Normally for stability prupose > >> at least 12-36 hours continue I/O pressure is necessary. Before Linux > >> v5.3 bcache will run into out-of-memory after 10 ~ 12 hours heavy randome > >> write workload on the server hardware Lenovo sponsored me. > > > > FYI: > > > > We have been running the v2 patch in production on 5 different servers > > containing a total of 8 bcache volumes since April 7th this year, applied > > to 6.6.25 and later kernels. Some servers run 4k sector sizes, and others > > run 512-byte sectors for the data volume. For the cache volumes, their all > > cache devices use 512 byte sectors. > > > > The backing storage on these servers range from 40-350 terabytes, and the > > cache sizes are in the 1-2 TB range. We log kernel messages with > > netconsole into a centralized log server and have not had any bcache > > issues. > > > Thanks for the information. The issue I stated didn’t generate kernel > message. It just causes all I/Os bypass the almost fully occupied cache > even it is all clean data. Anyway this is not directly caused by this > patch, this patch just makes it more easier to arrive such situation > before I found and fixed it. I am glad that you were able to fix it. Did you already post the patch with that fix, or can you point me add a commit hash? I am eager to try your fix. -- Eric Wheeler > > > And to all contributors (including Dongsheng, Mingzhe, Robert, Eric and others), > > At this moment I see it works fine on my server. I am about to submit it to Jens next week, if no other issue pops up. > > Thanks. > > Coly Li