Re: Is a tool available to check the integrity of copied files?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Fri, Apr 14, 2023 at 10:59:13PM +0200, Ralf Mardorf wrote:

Hi Ralf,

> Hi,
> 
> my google search was "does linux diff compare data using a cache".
> 
> I'm trying to figure out what's going on. The first diff of 10 GiB of
> data copied from a SATA3 SSD to an USB 2 stick connected to an USB 3
> port took around a minute, right after the copy finished. A second diff
> needed 3 seconds. Both returned exit status 0.
> 
> It's impossible to read 10 GiB of data in 3 seconds from an USB 2 stick.
> Does diff use cached data instead of comparing the "real" files line by
> line?
> Google returned "diff isn't doing any caching. The OS is. If you are
> using Linux, you can flush the disk buffers and cache".
> 
> I expected that diff ensures to compare the "real" files line by line,
> but seemingly diff isn't aimed to check integrity of data.

Short answer, yes. Diff can't chose. When it's asking to open the file
it's the linux kernel which returns the data and will do it from cache
if available.

> Does a command exist that compares "real" files, not just cached files
> by default?

I'm not aware of any such software. You will have to manually clear the
cache if you want this behavior. You can use this command to achieve
this: "echo 3 > /proc/sys/vm/drop_caches".

What makes you think that the cache isn't matching the file on disk?

I'd argue that you can be confident that the read cache is true to the
file on disk and would correctly be invalidated should the file ever
change on disk.

> I experience weird things with Raptor Lake hardware, especially if USB
> is involved and I want to check the integrity of USB transferred, saved
> files by using a tool, without manually clearing cached data manually.

As an alternate approach I'd suggest checking file integrity with a hash
tool. Eg. md5sum or similar. It's the more common approach rather then
diff as diff could produce a lot of output if there is a lot of
difference between 2 10GB files.

> Regards,
> Ralf

Br,
Linus



[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux