Re: broken raid level 5 array caused by user error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mathias,

On 01/19/2016 09:35 AM, Mathias Mueller wrote:
> Hi Phil,
> 
> I forgot to add some information: when I was creating the bytestrings
> from my jpg file, I did not start from 0k but from 100k of the jpg file
> (to skip the jpg header).

Ok. But I'm still not confident of chunk boundaries.

>> Very interesting.  You could go one step further and compare the jpeg
>> file contents in the first 1M against the locations found to determine
>> where the chunks actually start and end on each device.  The final
>> offset will be a chunk multiple before these boundaries.  Or do md5 sums
>> of 4k blocks to reduce the amount to inspect.
> 
> How exactly can I do this? Should I create more Bytestrings and do more
> brep with them on my physical devices? I have already results from
> searching bytestrings with an offset of 64k (starting from 100k to 612k
> of my jpeg file, so 9 bytestrings at all). Should I provide a table of
> the results?

Sigh.  I couldn't help myself.  New utility attached.  Curse you Mathias
for an interesting problem! ;-)

Call it with your jpeg and the devices to search, like so:

findHash.py /path/to/picture.jpeg /dev/sd[bcde]

It'll make a map of hashes of each 4k block in the jpeg and then search
the listed devices for those hashes, building a map of the file
fragments.  This will clearly show chunk boundaries.

Please show the output.

Phil
#! /usr/bin/python2
#
# Locate 4k fragments of a subject file in one or more other files or
# devices.  Only reports two or more consecutive matches.
#
# Usage:
#   findHash.py /path/to/subject/file /dev/sdx|/path/to/image/file [/dev/sdy ...]

import hashlib, sys, datetime

# Read the known file 4k at a time, building a dictionary of
# md5 hashes vs. offset.  Use a large buffer for speed.
# Drops any partial block at the end of the file.
d = {}
pos = long(0)
f = open(sys.argv[1], 'r', 1<<20)
b = f.read(4096)
while len(b)==4096:
	md5 = hashlib.md5()
	md5.update(b)
	h = md5.digest()
	hlist = d.get(h)
	if not hlist:
		hlist = []
		d[h] = hlist
#		print "New hash %s at %8.8x" % (h.encode('hex'), pos)
	hlist.append(pos)
	pos += 4096
	b = f.read(4096)
f.close()

print "%d Unique hashes in %s" % (len(d), sys.argv[1])

def checkAndPrint(match):
	if match[2]>4096:
		print "%20s @ %12.12x:%12.12x ~= %8.8x:%8.8x" % (fname, match[1], match[1]+match[2]-1, match[0], match[0]+match[2]-1)

# Read the candidate files/devices, looking for possible matches.  Match
# entries are vectors of known file offset, candidate file offset, and
# length.
for fname in sys.argv[2:]:
	print "\nSearching for pieces of %s in %s:..." % (sys.argv[1], fname)
	pos = long(0)
	f = open(fname, 'r', 1<<24)
	matches = []
	b = f.read(4096)
	lastts = None
	while len(b)==4096:
		if not (pos & 0x7ffffff):
			ts = datetime.datetime.now()
			if lastts:
				print "@ %12.12x %.1fMB/s   \r" % (pos, 128.0/((ts-lastts).total_seconds())),
			else:
				print "@ %12.12x...\r" % pos,
			sys.stdout.flush()
			lastts = ts
		md5 = hashlib.md5()
		md5.update(b)
		h = md5.digest()
		if h in d:
			i = 0
			while i<len(matches):
				match = matches[i]
				target = match[0]+match[2]
				continuations = [x for x in d[h] if x==target]
				if continuations:
					match[2] += 4096
					i += 1
				else:
					del matches[i]
					checkAndPrint(match)
			if not matches:
				matches = [[x, pos, 4096] for x in d[h]]
		else:
			for match in matches:
				checkAndPrint(match)
			matches = []
		pos += 4096
		b = f.read(4096)
	print "End of %s at %12.12x" % (fname, pos)
	# show matches that continue to the end of the candidate file/device.
	for match in matches:
		checkAndPrint(match)

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux