On Fri, Apr 14, 2023 at 10:59:13PM +0200, Ralf Mardorf wrote: Hi Ralf, > Hi, > > my google search was "does linux diff compare data using a cache". > > I'm trying to figure out what's going on. The first diff of 10 GiB of > data copied from a SATA3 SSD to an USB 2 stick connected to an USB 3 > port took around a minute, right after the copy finished. A second diff > needed 3 seconds. Both returned exit status 0. > > It's impossible to read 10 GiB of data in 3 seconds from an USB 2 stick. > Does diff use cached data instead of comparing the "real" files line by > line? > Google returned "diff isn't doing any caching. The OS is. If you are > using Linux, you can flush the disk buffers and cache". > > I expected that diff ensures to compare the "real" files line by line, > but seemingly diff isn't aimed to check integrity of data. Short answer, yes. Diff can't chose. When it's asking to open the file it's the linux kernel which returns the data and will do it from cache if available. > Does a command exist that compares "real" files, not just cached files > by default? I'm not aware of any such software. You will have to manually clear the cache if you want this behavior. You can use this command to achieve this: "echo 3 > /proc/sys/vm/drop_caches". What makes you think that the cache isn't matching the file on disk? I'd argue that you can be confident that the read cache is true to the file on disk and would correctly be invalidated should the file ever change on disk. > I experience weird things with Raptor Lake hardware, especially if USB > is involved and I want to check the integrity of USB transferred, saved > files by using a tool, without manually clearing cached data manually. As an alternate approach I'd suggest checking file integrity with a hash tool. Eg. md5sum or similar. It's the more common approach rather then diff as diff could produce a lot of output if there is a lot of difference between 2 10GB files. > Regards, > Ralf Br, Linus