Re: FIO - dedup checksums or specified blocksize does not match.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There seems to be one more issue with dedup for data that is not 100%
dedupable. I tried with 50% and 80% and it give only 35 for 50 and 60
for 80.

# cat ddp_file.fio
[dedupe]
filename=test.tmp.comp
bs=256k
rw=write
size=10m
dedupe_percentage=80
write_iolog=test.tmp.log.comp

# fio ddp_file.fio
dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K,
ioengine=sync, iodepth=1
fio-2.2.7-26-g9451b
Starting 1 process
dedupe: Laying out IO file(s) (1 file(s) / 10MB)

dedupe: (groupid=0, jobs=1): err= 0: pid=13376: Tue Apr 28 02:54:02 2015
  write: io=10240KB, bw=731429KB/s, iops=2857, runt=    14msec
    clat (usec): min=170, max=374, avg=235.80, stdev=41.11
     lat (usec): min=173, max=378, avg=239.10, stdev=41.75
    clat percentiles (usec):
     |  1.00th=[  171],  5.00th=[  175], 10.00th=[  197], 20.00th=[  213],
     | 30.00th=[  217], 40.00th=[  221], 50.00th=[  231], 60.00th=[  235],
     | 70.00th=[  239], 80.00th=[  253], 90.00th=[  262], 95.00th=[  318],
     | 99.00th=[  374], 99.50th=[  374], 99.90th=[  374], 99.95th=[  374],
     | 99.99th=[  374]
    lat (usec) : 250=77.50%, 500=22.50%
  cpu          : usr=57.14%, sys=28.57%, ctx=1, majf=0, minf=27
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=40/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=10240KB, aggrb=731428KB/s, minb=731428KB/s,
maxb=731428KB/s, mint=14msec, maxt=14msec

Disk stats (read/write):
  sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

# fio/t/fio-dedupe -b 262144 test.tmp.comp
Will check <test.tmp.comp>, size <10485760>, using 8 threads
Threads(8): 40 items processed
Extents=40, Unique extents=15
De-dupe ratio: 1:1.67
Fio setting: dedupe_percentage=63

I also confirmed the same by taking checksum of the data file by
individual blocks of size bs.

# for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1
skip=$each 2>/dev/null | hexdump -C | md5sum; done | wc -l
40  <<< have 40 blocks as expected.

# for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1
skip=$each 2>/dev/null | hexdump -C | md5sum; done | sort | uniq | wc
-l
16 <<< returns 16 unique blocks

In a 80% dedupable size, i would expect around 8 unique blocks. Is that true.?
Also, from the fio/t/fio-dedupe output, it shows that there are only
15 unique extents. Checking manually returns 16.

Thanks,
Srinivasa Chamarthy
Srinivasa R Chamarthy


On Tue, Apr 28, 2015 at 1:15 PM, Srinivasa Chamarthy
<chamarthy.raju@xxxxxxxxx> wrote:
> Seems working now. Thanks for the great support.
>
> for each in {0..7}; do dd if=test.tmp.comp bs=262144 count=1
> skip=$each 2>/dev/null | hexdump -C | md5sum; done
> e1d3c034e3fc15481e5c8610333ad9cd  -
> e1d3c034e3fc15481e5c8610333ad9cd  -
> e1d3c034e3fc15481e5c8610333ad9cd  -
> e1d3c034e3fc15481e5c8610333ad9cd  -
> e1d3c034e3fc15481e5c8610333ad9cd  -
> e1d3c034e3fc15481e5c8610333ad9cd  -
> e1d3c034e3fc15481e5c8610333ad9cd  -
> e1d3c034e3fc15481e5c8610333ad9cd  -
> Srinivasa R Chamarthy
>
>
> On Mon, Apr 27, 2015 at 10:39 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 04/27/2015 07:18 AM, Srinivasa Chamarthy wrote:
>>>
>>> I was just verifying if i could generate 100% duplicable data with
>>> FIO. I have configured small workload with bs of 256k and writing 2MB
>>> of file. I tried to get the checksum of each of 256k blocks of data
>>> from the file and the checksums do not match. If i am not wrong, when
>>> i specify data as 100% deduppable, my checksums should match isn't it?
>>>
>>> # cat ddp_file.fio
>>> [dedupe]
>>> filename=test.tmp
>>> bs=256k
>>> rw=write
>>> size=2m
>>> dedupe_percentage=100
>>> write_iolog=test.tmp.log
>>>
>>> # fio ddp_file.fio
>>> dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K,
>>> ioengine=sync, iodepth=1
>>> fio-2.2.7-24-g7c30
>>> Starting 1 process
>>> dedupe: Laying out IO file(s) (1 file(s) / 2MB)
>>>
>>> dedupe: (groupid=0, jobs=1): err= 0: pid=31497: Mon Apr 27 09:13:35 2015
>>>    write: io=2048.0KB, bw=2000.0MB/s, iops=8000, runt=     1msec
>>>      clat (usec): min=123, max=183, avg=150.50, stdev=22.35
>>>       lat (usec): min=125, max=184, avg=152.38, stdev=22.08
>>>      clat percentiles (usec):
>>>       |  1.00th=[  123],  5.00th=[  123], 10.00th=[  123], 20.00th=[
>>> 124],
>>>       | 30.00th=[  139], 40.00th=[  145], 50.00th=[  145], 60.00th=[
>>> 155],
>>>       | 70.00th=[  159], 80.00th=[  177], 90.00th=[  183], 95.00th=[
>>> 183],
>>>       | 99.00th=[  183], 99.50th=[  183], 99.90th=[  183], 99.95th=[
>>> 183],
>>>       | 99.99th=[  183]
>>>      lat (usec) : 250=100.00%
>>>    cpu          : usr=0.00%, sys=0.00%, ctx=1, majf=0, minf=28
>>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>> >=64=0.0%
>>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>>       issued    : total=r=0/w=8/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>>>
>>> Run status group 0 (all jobs):
>>>    WRITE: io=2048KB, aggrb=2000.0MB/s, minb=2000.0MB/s,
>>> maxb=2000.0MB/s, mint=1msec, maxt=1msec
>>>
>>> Disk stats (read/write):
>>>    sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>>>
>>> # ls -lh test.tmp
>>> -rw-r--r-- 1 root root 2.0M Apr 27 09:13 test.tmp
>>>
>>> # cat test.tmp.log
>>> fio version 2 iolog
>>> test.tmp add
>>> test.tmp open
>>> test.tmp write 0 262144
>>> test.tmp write 262144 262144
>>> test.tmp write 524288 262144
>>> test.tmp write 786432 262144
>>> test.tmp write 1048576 262144
>>> test.tmp write 1310720 262144
>>> test.tmp write 1572864 262144
>>> test.tmp write 1835008 262144
>>> test.tmp close
>>>
>>> # for each in {0..7}; do dd if=test.tmp bs=262144 count=1 skip=$each
>>> 2>/dev/null | hexdump -C | md5sum; done
>>> 71a1660503bcff7c4e20a763d569d069  -
>>> 9c9bb7ec1020b4d4249028aecc896e6b  -
>>> 68b9685812d47c822532854201c9b352  -
>>> e5c8ef471a27ba92b86893ee5ded654b  -
>>> 14e0e798a8af3f4e6abdaf022ddf91c3  -
>>> 85528ae970bd25dde8c39ecaaffa4cf3  -
>>> 60b8ccf0e0793094b9356544fb541f3a  -
>>> ef736cc9cbf7588cb7b84467cb37c44e  -
>>>
>>> # fio -v
>>> fio-2.2.7-24-g7c30
>>
>>
>> Can you try with current -git? The corner cases of being 100% dedupable was
>> broken.
>>
>> --
>> Jens Axboe
>>
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux