On 30Mar2015 16:39, Ranjan Maitra <maitra.mbox.ignored@xxxxxxxxx> wrote:
Thanks, Cameron!
Any time.
If you are using a CIFS share, does that mean the far end is not a UNIX
filesystem? The -S (sparse) option is only useful if the backend can store
sparse files, otherwise the backend will just store lots of blocks of zeroes
(presuming you really have sparse files).
The filesystem is a "high-end" IFS filesystem according to our systems administrator. (I have no idea what an IFS filesystem is.)
It isn't, if The Google and Wikipedia are to be believed. IFS may mean
"Installable File System":
http://en.wikipedia.org/wiki/Installable_File_System
which is a Windows API for presenting filesystems to the OS, allowing various
drivers for backend filesystems to be plugged in. Like the VFS layer in a Linux
kernel I suppose.
At any rate, it says nothing of itself about what filesystem implements the
storage, but it may place constraints on what you can do with said filesystem
i.e. if the API does not support symlinks then you do not get symlinks and so
forth. Under UNIX there's no specific API for sparse files - you just avoid
writing the blocks full of NULs, instead seeking past them to the next non-NUL
data. The filesystem remembers the gap and fakes up empty (well, full of NULs)
blocks if you read from the gap instead of allocating (wasted) storage. The
point of this is that _if_ the backend supports sparse files they should work
if your file writing tool knows to behave that way (as rsync with -S does).
So your sysadmin might not know. Or might not be telling you (weird, but some
people are like that). This page:
http://support.microsoft.com/en-us/kb/100108
suggests that M$ consider NTFS to be "high end"; at any rate that is the only
place on the above page with the term. So: no hard links I think, no sparse
files I think, support for symlinks in the filesystem itself but no Windows API
for making them (IIRC).
Is the source kmeans directory full of hard links (not symlinks)? If so, rsync
will not preserve hard links without the -H option (even with -a) and
regardless I do not know if CIFS supports making hard links or if your backend
supports hard links).
No, there are no hard links. Only a few files at the top level and about 4 directories with lots of files in them.
Ok.
Sparse files?
One way to check is to tally the byte sizes of the files, and compare it with
the result of "du", which counts allocated blocks.
Tallying the file byte lengths can be done like this:
find . -type f -ls | colsum 7
"colsum" is a script of my own which makes and runs an awk script to sum
particular columns:
https://bitbucket.org/cameron_simpson/css/src/tip/bin/colsum
Nothing very clever, but surprisingly useful to me.
If the far end is windows, certainly sparse files will no longer be sparse at
the far end. This kind of thing is one reason we get so picky when getting
employers to order NASes; we asked for a QNAP (cheap, useful, Linux backend)
and they wanted to order Microsoft's storage thingummy, which would have broken
all sorts of stuff just like what you're encountering.
I suspect that that is what has happened here: the stuff is too high-end to be of any use.
Ah. Like "enterprise":-)
Does anything above assist?
Yes, it does, at least providing a possible explanation.
I don't think it would make any difference if we tar'ed the file over to the
cifs share and then untarred, would it?
If all the access is done via CIFS from your client system, I would not expect
this to be any better.
Besides, I don't think tar does sparse files. (Or does it? GNU tar might, I
haven't looked.)
If GNU tar will store sparse files efficiently, and you have sparse files, then
you could profitably store the tar file on the CIFS share. I would expect
things to go sour as soon as you untarred it, which makes accessing the
contents tedious.
Do you have sparsae file?
As an anecdote, the other day I was working on set of web pages with mugshots
and full images, etc. Wanting to show it to a remote colleague I thought "I'll
use a zip file, he'll understand it". 133MB of files, 4GB of zip file. Ugh.
Why? I was still waiting on the real image data so we'd taken a few photos
(~20) and allocated them to the 500 odd images needed, using hard links. Zip
does not understand hard links.
Cheers,
Cameron Simpson <cs@xxxxxxxxxx>
(Bashir tells the story of the boy who cried "Wolf")
Bashir: If you lie all the time, no one is going to believe you, even
when you're telling the truth.
Garak: Are you sure that's the point, Doctor?
Bashir: Of course. What else would it be?
Garak: That you should never tell the same lie twice.
--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org