Re: Gulliver

Peter Kjellström <cap@xxxxxxxxxx> · Mon, 30 Oct 2017 18:14:00 +0100

On Mon, 30 Oct 2017 17:07:31 +0000 (UTC)
Chris Olson <chris_e_olson@xxxxxxxxx> wrote:

> We have been fortunate to hang onto one of our summer interns
> for part time work on weekends during the current school year.
> One of the intern's jobs is to load documents and data which
> are then processed.  The documents are .txt, .docx, and .pdf
> files. The data files are raw sensor outputs usually captured
> using ADCs mostly with eight bit precision.  All files are
> loaded or moved from one machine to another with sftp.
> 
> The intern noticed right a way that the documents will transfer
> perfectly from our PPC and SPARC machines to our Intel/CentOS
> platforms.  The raw data files, not so much.  There is always
> an Endian (Thanks Gulliver) issue, which we assume is due to
> the bytes of data being formatted into 32 bit words somewhere
> in the Big Endian systems.  It is not totally clear why the
> document files do not have this issue.  If there is a known
> principle behind these observations, we would appreciate very
> much any information that can shared.

Transferring a file will not change anything. It will be bit-wise
identical.

However the data in the file may be in bit-wise little or big endian
order. A file format may or may not have metadata indicating this.
That is, some files will read differently on different arch'es and
some will be immune (due to more sophisticated abstractions).

So it's not surprising that your raw files will have problems.

If you want to prove this to yourself simply md5sum/sha1sum/etc the
files on both sides.

/Peter K
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos