Re: FIO - dedup checksums or specified blocksize does not match.

Srinivasa Chamarthy <chamarthy.raju@xxxxxxxxx> · Tue, 28 Apr 2015 18:54:44 +0800

It may be a problem with using smaller io size. If we increase the
size of the io, i think its getting near to what is expected.

# cat  ddp_file.fio
[dedupe]
filename=test.tmp.comp
bs=32k
rw=write
size=1g
dedupe_percentage=80
write_iolog=test.tmp.log.comp

# fio/t/fio-dedupe -b 32768 test.tmp.comp
Will check <test.tmp.comp>, size <1073741824>, using 8 threads
Threads(8): 32768 items processed
Extents=32768, Unique extents=6623
De-dupe ratio: 1:3.95
Fio setting: dedupe_percentage=80
Srinivasa R Chamarthy

On Tue, Apr 28, 2015 at 3:03 PM, Srinivasa Chamarthy
<chamarthy.raju@xxxxxxxxx> wrote:
> There seems to be one more issue with dedup for data that is not 100%
> dedupable. I tried with 50% and 80% and it give only 35 for 50 and 60
> for 80.
>
> # cat ddp_file.fio
> [dedupe]
> filename=test.tmp.comp
> bs=256k
> rw=write
> size=10m
> dedupe_percentage=80
> write_iolog=test.tmp.log.comp
>
> # fio ddp_file.fio
> dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K,
> ioengine=sync, iodepth=1
> fio-2.2.7-26-g9451b
> Starting 1 process
> dedupe: Laying out IO file(s) (1 file(s) / 10MB)
>
> dedupe: (groupid=0, jobs=1): err= 0: pid=13376: Tue Apr 28 02:54:02 2015
>   write: io=10240KB, bw=731429KB/s, iops=2857, runt=    14msec
>     clat (usec): min=170, max=374, avg=235.80, stdev=41.11
>      lat (usec): min=173, max=378, avg=239.10, stdev=41.75
>     clat percentiles (usec):
>      |  1.00th=[  171],  5.00th=[  175], 10.00th=[  197], 20.00th=[  213],
>      | 30.00th=[  217], 40.00th=[  221], 50.00th=[  231], 60.00th=[  235],
>      | 70.00th=[  239], 80.00th=[  253], 90.00th=[  262], 95.00th=[  318],
>      | 99.00th=[  374], 99.50th=[  374], 99.90th=[  374], 99.95th=[  374],
>      | 99.99th=[  374]
>     lat (usec) : 250=77.50%, 500=22.50%
>   cpu          : usr=57.14%, sys=28.57%, ctx=1, majf=0, minf=27
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=0/w=40/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>   WRITE: io=10240KB, aggrb=731428KB/s, minb=731428KB/s,
> maxb=731428KB/s, mint=14msec, maxt=14msec
>
> Disk stats (read/write):
>   sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> # fio/t/fio-dedupe -b 262144 test.tmp.comp
> Will check <test.tmp.comp>, size <10485760>, using 8 threads
> Threads(8): 40 items processed
> Extents=40, Unique extents=15
> De-dupe ratio: 1:1.67
> Fio setting: dedupe_percentage=63
>
> I also confirmed the same by taking checksum of the data file by
> individual blocks of size bs.
>
> # for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1
> skip=$each 2>/dev/null | hexdump -C | md5sum; done | wc -l
> 40  <<< have 40 blocks as expected.
>
> # for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1
> skip=$each 2>/dev/null | hexdump -C | md5sum; done | sort | uniq | wc
> -l
> 16 <<< returns 16 unique blocks
>
> In a 80% dedupable size, i would expect around 8 unique blocks. Is that true.?
> Also, from the fio/t/fio-dedupe output, it shows that there are only
> 15 unique extents. Checking manually returns 16.
>
> Thanks,
> Srinivasa Chamarthy
> Srinivasa R Chamarthy
>
>
> On Tue, Apr 28, 2015 at 1:15 PM, Srinivasa Chamarthy
> <chamarthy.raju@xxxxxxxxx> wrote:
>> Seems working now. Thanks for the great support.
>>
>> for each in {0..7}; do dd if=test.tmp.comp bs=262144 count=1
>> skip=$each 2>/dev/null | hexdump -C | md5sum; done
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> e1d3c034e3fc15481e5c8610333ad9cd  -
>> Srinivasa R Chamarthy
>>
>>
>> On Mon, Apr 27, 2015 at 10:39 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>> On 04/27/2015 07:18 AM, Srinivasa Chamarthy wrote:
>>>>
>>>> I was just verifying if i could generate 100% duplicable data with
>>>> FIO. I have configured small workload with bs of 256k and writing 2MB
>>>> of file. I tried to get the checksum of each of 256k blocks of data
>>>> from the file and the checksums do not match. If i am not wrong, when
>>>> i specify data as 100% deduppable, my checksums should match isn't it?
>>>>
>>>> # cat ddp_file.fio
>>>> [dedupe]
>>>> filename=test.tmp
>>>> bs=256k
>>>> rw=write
>>>> size=2m
>>>> dedupe_percentage=100
>>>> write_iolog=test.tmp.log
>>>>
>>>> # fio ddp_file.fio
>>>> dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K,
>>>> ioengine=sync, iodepth=1
>>>> fio-2.2.7-24-g7c30
>>>> Starting 1 process
>>>> dedupe: Laying out IO file(s) (1 file(s) / 2MB)
>>>>
>>>> dedupe: (groupid=0, jobs=1): err= 0: pid=31497: Mon Apr 27 09:13:35 2015
>>>>    write: io=2048.0KB, bw=2000.0MB/s, iops=8000, runt=     1msec
>>>>      clat (usec): min=123, max=183, avg=150.50, stdev=22.35
>>>>       lat (usec): min=125, max=184, avg=152.38, stdev=22.08
>>>>      clat percentiles (usec):
>>>>       |  1.00th=[  123],  5.00th=[  123], 10.00th=[  123], 20.00th=[
>>>> 124],
>>>>       | 30.00th=[  139], 40.00th=[  145], 50.00th=[  145], 60.00th=[
>>>> 155],
>>>>       | 70.00th=[  159], 80.00th=[  177], 90.00th=[  183], 95.00th=[
>>>> 183],
>>>>       | 99.00th=[  183], 99.50th=[  183], 99.90th=[  183], 99.95th=[
>>>> 183],
>>>>       | 99.99th=[  183]
>>>>      lat (usec) : 250=100.00%
>>>>    cpu          : usr=0.00%, sys=0.00%, ctx=1, majf=0, minf=28
>>>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>>> >=64=0.0%
>>>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>>> >=64=0.0%
>>>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>>> >=64=0.0%
>>>>       issued    : total=r=0/w=8/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>>>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>>>>
>>>> Run status group 0 (all jobs):
>>>>    WRITE: io=2048KB, aggrb=2000.0MB/s, minb=2000.0MB/s,
>>>> maxb=2000.0MB/s, mint=1msec, maxt=1msec
>>>>
>>>> Disk stats (read/write):
>>>>    sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>>>>
>>>> # ls -lh test.tmp
>>>> -rw-r--r-- 1 root root 2.0M Apr 27 09:13 test.tmp
>>>>
>>>> # cat test.tmp.log
>>>> fio version 2 iolog
>>>> test.tmp add
>>>> test.tmp open
>>>> test.tmp write 0 262144
>>>> test.tmp write 262144 262144
>>>> test.tmp write 524288 262144
>>>> test.tmp write 786432 262144
>>>> test.tmp write 1048576 262144
>>>> test.tmp write 1310720 262144
>>>> test.tmp write 1572864 262144
>>>> test.tmp write 1835008 262144
>>>> test.tmp close
>>>>
>>>> # for each in {0..7}; do dd if=test.tmp bs=262144 count=1 skip=$each
>>>> 2>/dev/null | hexdump -C | md5sum; done
>>>> 71a1660503bcff7c4e20a763d569d069  -
>>>> 9c9bb7ec1020b4d4249028aecc896e6b  -
>>>> 68b9685812d47c822532854201c9b352  -
>>>> e5c8ef471a27ba92b86893ee5ded654b  -
>>>> 14e0e798a8af3f4e6abdaf022ddf91c3  -
>>>> 85528ae970bd25dde8c39ecaaffa4cf3  -
>>>> 60b8ccf0e0793094b9356544fb541f3a  -
>>>> ef736cc9cbf7588cb7b84467cb37c44e  -
>>>>
>>>> # fio -v
>>>> fio-2.2.7-24-g7c30
>>>
>>>
>>> Can you try with current -git? The corner cases of being 100% dedupable was
>>> broken.
>>>
>>> --
>>> Jens Axboe
>>>
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html