RE: random writes with different patterns

"Foley, Robert" <robert.foley@xxxxxxx> · Wed, 13 Apr 2016 20:08:22 +0000

>On Wednesday, April 13, 2016 1:39 AM,  Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx] wrote:
>Some sort of "generation" id in the header would allow better verification when using something like 
>loops (assuming it was being incremented on a per iteration basis). Perhaps what is needed is some 
> sort of verify_header_extra_pattern that allows a few extra bytes to be set in the header and verified later...

This is a good point.  The verify_header_extra_pattern parameter alone would solve one of our use cases where we want to write a sequence of random blocks with a pattern and then overwrite the same sequence with patterns that vary just by that extra pattern.   With use of randseed, we will be able to write to the same set of blocks with random data.  But in between runs it would be enough for only this extra_pattern in the header to vary.  Also, the naming that you suggested seems just right here.  We can start putting this together soon, and will contribute it when it is ready.

> However this doesn't seem to solve your entire problem as I understood
> it: given the same randseed you want the ability for the same blocks to be written in the same order but with different 
>pseudorandom data contents? Further this data must be verifiable in a separate job?
>Would changing the buffer_compress_percentage option do?

You bring up a good point in that we do have a use case where we want the data in the block to also vary between runs to the same sequence of blocks.  So the use case is where we write a set of random blocks with random data.  Later we do want to be able to verify that data is the same.  But we also want to overwrite that same sequence of blocks with a different data pattern.  We looked at the buffer_compress_percentage option and we believe that this does not help us since we want to be able to write blocks that are completely different from the prior run block.  We were concerned that we might be testing a use case where we actually do not want the data to be compressible/dedupable so it would be better for the entire block to vary.

It seems that when we use the randseed option, this single seed will effectively seed everything including the offset generation (I/O pattern) and the data pattern generation.  It seems like a new parameter (rand_verify_seed) that allows us to provide the random seed for the data pattern alone would be quite useful in general and would help us solve this case.  It would allow us in this case to specify the same randseed so that the I/O pattern is the same, but then use a different verify seed so that the data pattern is different. 

This potential new parameter for rand_verify_seed seems like a good option here, but as always we would like to hear thoughts and ideas here.  We would be willing to contribute this if it seems useful.

Thanks ! 
-Rob
��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�