Large/unreliable file uploading over HTTP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Let's face it - HTTP is not very good for file uploads. It's stateless
nature, slow connections, inability to resume (technically), etc, etc.

What I've been thinking about is a way to skip all the normal
annoyances with file uploading - multipart form encodings, file upload
tools with specific needs, PUT vs POST, connection resets, ... the
list goes on and on.

Usenet and BitTorrent and other protocols have the right idea - split
up the workload into smaller sets of data. It's easier to operate on.
Usenet has NZB files. BitTorrent splits everything up into chunks. Not
only would it make working with the data more portable (no need to set
your PHP memory limit or POST limits to insane amounts to support
large files) but it could also support multiple segments of the file
being transferred at once...

Here's somewhat of a process braindump of what I'm thinking. It still
requires a 'smart' applet (Flash, Java, anything that can split a file
up and send data over HTTP/HTTPS) - no special socket needs, no PUT
support needed, don't even need to use multipart POST encoding (as far
as I know) - just send the data in chunks over the wire and have a PHP
script on the other side collect the data and reassemble it.

Does this sound insane? I think this is a pretty good approach - no
PUT needed, no large POST configuration required, anything could
upload to it as long as it sends the information properly (I'm
thinking HTTP POST for the header info, and the data could be sent as
another POST field maybe base64 encoded or something that will stay
safe during transit...)



- take input file, checksum it (md5 / sha1)
- calculate number of chunks to split it up based on $chunk configured
size (for example 128k chunks)
- split the file into chunks of $chunk size and create checksums for
all (could use sfv?)
- send request to the server - with the info - use JSON?
	action=begin
	filename=$filename
	filesize=$filesize
	checksum=$checksum
	segments=list of segments (unique segment id, checksum, bytes)
	- server sends back a "server ready" and unique $transaction_id
- start sending to the server, send header with transaction key and
unique chunk identifier in it
	action=process
	transaction=$transaction_id
	segment=$unique_segment_id
               data=base64_encode($segment_data)
	- when done, server sends back $transaction_id, $segment_id, $status
(okay, fail)
- client compares checksum for identifier, if okay, move to next chunk
	- if it does not match, retry uploading to server again
- when all chunks are done, send request with transaction key and
	action=finish
	transaction=$transaction_id
               - when the server receives this, it assembles the file
from the segments and does a final checksum, and reports the checksum
back to the client (warning: on a large file this could take a bit?)
and sends back $transaction_id, $checksum
- client does one last check against the file's original checksum, if
it matches report success, otherwise report failure (would need to
determine why though - if all segments match this should not be able
to happen...)

I'd appreciate everyone's thoughts. This would also allow for file
upload progress, more or less, as the server and client are constantly
communicating when chunks are done and in progress (but again, that
has to be done with an applet)

I can't think of any method to do it in-browser, but doing it this way
could open the gates for things like Google Gears to possibly work
too...

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux