Karen Hill wrote: > Tom Lane wrote: > > "Karen Hill" <karen_hill22@xxxxxxxxx> writes: > > > Ralph Kimball states that this is a way to check for changes. You just > > > have an extra column for the crc checksum. When you go to update data, > > > generate a crc checksum and compare it to the one in the crc column. > > > If they are same, your data has not changed. > > > > You sure that's actually what he said? A change in CRC proves the data > > changed, but lack of a change does not prove it didn't. > > > On page 100 in the book, "The Data Warehouse Toolkit" Second Edition, > Ralph Kimball writes the following: > > "Rather than checking each field to see if something has changed, we > instead compute a checksum for the entire row all at once. A cyclic > redundancy checksum (CRC) algorithm helps us quickly recognize that a > wide messy row has changed without looking at each of its constituent > fields." > > On page 360 he writes: > > "To quickly determine if rows have changed, we rely on a cyclic > redundancy checksum (CRC) algorithm. If the CRC is identical for the > extracted record and the most recent row in the master table, then we > ignore the extracted record. We don't need to check every column to be > certain that the two rows match exactly." > > People do sometimes use this logic in connection with much wider > > "summary" functions, such as an MD5 hash. I wouldn't trust it at all > > with a 32-bit CRC, and not much with a 64-bit CRC. Too much risk of > > collision. Ho do you calculate a checksum on a binary file stored in a database? What part of the file are you going to use for computations?