Correcting... Am So., 17. Feb. 2019 um 14:12 Uhr schrieb Kai Krakow <kai@xxxxxxxxxxx>: > > Hello! > > Am So., 17. Feb. 2019 um 06:41 Uhr schrieb Coly Li <colyli@xxxxxxx>: > > > > On 2019/2/16 9:28 下午, Andreas wrote: > > > Thank you, I understand the situation a little better now. > > > > > > Saving cache space makes sense for cache drives that, as you said, are > > > small. But for users like me, who are going the extra mile and install a > > > generously large cache drive the behaviour is punishing. > > > After upgrading my kernel and swapping out the cache drive, I was having > > > trouble getting my new 128GB cache filled from three 8TB hard drives, > > > which set me on my journey to figure out why and ended up writing my > > > patch. I also know of people using SSDs as large as 512GB exclusively > > > for bcache. > > > > > > The symptom that made me curious about there being an odd change in > > > bcache behaviour was my MP3 music library, where my file browser reads > > > the ID3-tag information from these files. No matter how often I scrolled > > > through my library, most of the traffic kept going to the hard drive and > > > bcache wasn't adding any new data to the cache drive despite there being > > > upwards of 100GB of unused cache space. > > > As it turned out, my file explorer first issues a small read to each > > > file to determine the size and position of the ID3-tag section. The > > > readahead operation attached to this small read would then fetch the > > > actual ID3-tag and the subsequent read for the tag data would not issue > > > a seperate operation to be considered by bcache. This is then done for > > > several files simultaneously - a workload an SSD can happily deal with > > > but a HDD gets overwhelmed by. > > > Bcache only cached that first small read for each file and ignored the > > > actual ID3-tag data as it was fetched from a readahead. This behaviour > > > was consistent in that even in subsequent iterations of the scenario > > > only that first small read was served from the cache and then the HDD > > > had to slowly seek to the actual ID3-tag data without bcache ever > > > picking up on it as it was still being fetched by a readahead. > > > So while in theory it might sound fine to rely on readaheads to the HDD, > > > in practice it is noticeably faster to have everything coming from the > > > SSD cache. > > > > > > > Hi Andreas, > > > > Thanks for your patience and explanation. I come to understand your use > > case, it is reasonable to have such readahead data on cache device. > > > > > I believe that one of the core problems with this behaviour is that > > > bcache simply doesn't know if data fetched in a readahead is actually > > > being used or not. Caching readaheads leads to false positives (data > > > cached that isn't being used) and bypassing readaheads leads to false > > > negatives (data not cached that is being used) - in my eyes it should be > > > up to the user to decide which way works better for them if they want to. > > > > > > To me, bypassing readahead and background IO only seems like a good idea > > > for relatively small caches (I'd say <= 16GB). But users with bigger > > > caches get punished by this behaviour as they could get better > > > performance out of it (and have been until late 2017). > > > > > > Beside this anecdotal evidence and thought I cannot provide any hard > > > numbers on the issue. > > > > > > > Let me explain why a performance number is desired. Normally most of > > readahead pages only being accessed once, so it is sufficient to only > > keep them in memory for once. It is worthy to keep the readahead data in > > cache device only when the data will be accessed multiple times (hot). > > Otherwise bcache just introduces more I/Os on SSD, does not help much on > > performance. > > Here's a suggested solution that could work and improve hit rate > especially for devices which are too small for the applied workload: > > In the LRU algorithm, never insert new cache entries at the tip of the > list but only at a random location in the LRU list. Only move the > entry to the tip of the LRU list when it is accessed. This way, data > only accessed once has a better chance of being flushed from the cache > early on average. It shouldn't impact big caches. So a good > performance test would be to create a workload which exceeds the size > of a small cache, and then run reproducer tests to create some > performance numbers (hit rate, throughput, latency, time to run etc.). > > The obvious downside is that we may push some IO out of the cache too > early that happens to be accessed a second time just a few requests > after this event. But the random factor should filter this problem > out, so me may hit a false negative only once or twice before it stays > in the cache. > > I was already thinking about creating such a patch but I really do not > understand the LRU functions very well... It seems they do not easily > support an "insert_at" operate, only a "random discard" - but latter > is not the same as random discard. In fact, I think random discard is ...is not the same as random INSERT... > quite useless, especially once a random insert would be in the code. > Random discard doesn't know the history of the entry discarded, while > random insert does. > > What do you think? > > > Thanks, > Kai