It may be a problem with using smaller io size. If we increase the size of the io, i think its getting near to what is expected. # cat ddp_file.fio [dedupe] filename=test.tmp.comp bs=32k rw=write size=1g dedupe_percentage=80 write_iolog=test.tmp.log.comp # fio/t/fio-dedupe -b 32768 test.tmp.comp Will check <test.tmp.comp>, size <1073741824>, using 8 threads Threads(8): 32768 items processed Extents=32768, Unique extents=6623 De-dupe ratio: 1:3.95 Fio setting: dedupe_percentage=80 Srinivasa R Chamarthy On Tue, Apr 28, 2015 at 3:03 PM, Srinivasa Chamarthy <chamarthy.raju@xxxxxxxxx> wrote: > There seems to be one more issue with dedup for data that is not 100% > dedupable. I tried with 50% and 80% and it give only 35 for 50 and 60 > for 80. > > # cat ddp_file.fio > [dedupe] > filename=test.tmp.comp > bs=256k > rw=write > size=10m > dedupe_percentage=80 > write_iolog=test.tmp.log.comp > > # fio ddp_file.fio > dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K, > ioengine=sync, iodepth=1 > fio-2.2.7-26-g9451b > Starting 1 process > dedupe: Laying out IO file(s) (1 file(s) / 10MB) > > dedupe: (groupid=0, jobs=1): err= 0: pid=13376: Tue Apr 28 02:54:02 2015 > write: io=10240KB, bw=731429KB/s, iops=2857, runt= 14msec > clat (usec): min=170, max=374, avg=235.80, stdev=41.11 > lat (usec): min=173, max=378, avg=239.10, stdev=41.75 > clat percentiles (usec): > | 1.00th=[ 171], 5.00th=[ 175], 10.00th=[ 197], 20.00th=[ 213], > | 30.00th=[ 217], 40.00th=[ 221], 50.00th=[ 231], 60.00th=[ 235], > | 70.00th=[ 239], 80.00th=[ 253], 90.00th=[ 262], 95.00th=[ 318], > | 99.00th=[ 374], 99.50th=[ 374], 99.90th=[ 374], 99.95th=[ 374], > | 99.99th=[ 374] > lat (usec) : 250=77.50%, 500=22.50% > cpu : usr=57.14%, sys=28.57%, ctx=1, majf=0, minf=27 > IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > issued : total=r=0/w=40/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=1 > > Run status group 0 (all jobs): > WRITE: io=10240KB, aggrb=731428KB/s, minb=731428KB/s, > maxb=731428KB/s, mint=14msec, maxt=14msec > > Disk stats (read/write): > sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% > > # fio/t/fio-dedupe -b 262144 test.tmp.comp > Will check <test.tmp.comp>, size <10485760>, using 8 threads > Threads(8): 40 items processed > Extents=40, Unique extents=15 > De-dupe ratio: 1:1.67 > Fio setting: dedupe_percentage=63 > > I also confirmed the same by taking checksum of the data file by > individual blocks of size bs. > > # for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1 > skip=$each 2>/dev/null | hexdump -C | md5sum; done | wc -l > 40 <<< have 40 blocks as expected. > > # for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1 > skip=$each 2>/dev/null | hexdump -C | md5sum; done | sort | uniq | wc > -l > 16 <<< returns 16 unique blocks > > In a 80% dedupable size, i would expect around 8 unique blocks. Is that true.? > Also, from the fio/t/fio-dedupe output, it shows that there are only > 15 unique extents. Checking manually returns 16. > > Thanks, > Srinivasa Chamarthy > Srinivasa R Chamarthy > > > On Tue, Apr 28, 2015 at 1:15 PM, Srinivasa Chamarthy > <chamarthy.raju@xxxxxxxxx> wrote: >> Seems working now. Thanks for the great support. >> >> for each in {0..7}; do dd if=test.tmp.comp bs=262144 count=1 >> skip=$each 2>/dev/null | hexdump -C | md5sum; done >> e1d3c034e3fc15481e5c8610333ad9cd - >> e1d3c034e3fc15481e5c8610333ad9cd - >> e1d3c034e3fc15481e5c8610333ad9cd - >> e1d3c034e3fc15481e5c8610333ad9cd - >> e1d3c034e3fc15481e5c8610333ad9cd - >> e1d3c034e3fc15481e5c8610333ad9cd - >> e1d3c034e3fc15481e5c8610333ad9cd - >> e1d3c034e3fc15481e5c8610333ad9cd - >> Srinivasa R Chamarthy >> >> >> On Mon, Apr 27, 2015 at 10:39 PM, Jens Axboe <axboe@xxxxxxxxx> wrote: >>> On 04/27/2015 07:18 AM, Srinivasa Chamarthy wrote: >>>> >>>> I was just verifying if i could generate 100% duplicable data with >>>> FIO. I have configured small workload with bs of 256k and writing 2MB >>>> of file. I tried to get the checksum of each of 256k blocks of data >>>> from the file and the checksums do not match. If i am not wrong, when >>>> i specify data as 100% deduppable, my checksums should match isn't it? >>>> >>>> # cat ddp_file.fio >>>> [dedupe] >>>> filename=test.tmp >>>> bs=256k >>>> rw=write >>>> size=2m >>>> dedupe_percentage=100 >>>> write_iolog=test.tmp.log >>>> >>>> # fio ddp_file.fio >>>> dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K, >>>> ioengine=sync, iodepth=1 >>>> fio-2.2.7-24-g7c30 >>>> Starting 1 process >>>> dedupe: Laying out IO file(s) (1 file(s) / 2MB) >>>> >>>> dedupe: (groupid=0, jobs=1): err= 0: pid=31497: Mon Apr 27 09:13:35 2015 >>>> write: io=2048.0KB, bw=2000.0MB/s, iops=8000, runt= 1msec >>>> clat (usec): min=123, max=183, avg=150.50, stdev=22.35 >>>> lat (usec): min=125, max=184, avg=152.38, stdev=22.08 >>>> clat percentiles (usec): >>>> | 1.00th=[ 123], 5.00th=[ 123], 10.00th=[ 123], 20.00th=[ >>>> 124], >>>> | 30.00th=[ 139], 40.00th=[ 145], 50.00th=[ 145], 60.00th=[ >>>> 155], >>>> | 70.00th=[ 159], 80.00th=[ 177], 90.00th=[ 183], 95.00th=[ >>>> 183], >>>> | 99.00th=[ 183], 99.50th=[ 183], 99.90th=[ 183], 99.95th=[ >>>> 183], >>>> | 99.99th=[ 183] >>>> lat (usec) : 250=100.00% >>>> cpu : usr=0.00%, sys=0.00%, ctx=1, majf=0, minf=28 >>>> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >>>> >=64=0.0% >>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >>>> >=64=0.0% >>>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >>>> >=64=0.0% >>>> issued : total=r=0/w=8/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 >>>> latency : target=0, window=0, percentile=100.00%, depth=1 >>>> >>>> Run status group 0 (all jobs): >>>> WRITE: io=2048KB, aggrb=2000.0MB/s, minb=2000.0MB/s, >>>> maxb=2000.0MB/s, mint=1msec, maxt=1msec >>>> >>>> Disk stats (read/write): >>>> sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% >>>> >>>> # ls -lh test.tmp >>>> -rw-r--r-- 1 root root 2.0M Apr 27 09:13 test.tmp >>>> >>>> # cat test.tmp.log >>>> fio version 2 iolog >>>> test.tmp add >>>> test.tmp open >>>> test.tmp write 0 262144 >>>> test.tmp write 262144 262144 >>>> test.tmp write 524288 262144 >>>> test.tmp write 786432 262144 >>>> test.tmp write 1048576 262144 >>>> test.tmp write 1310720 262144 >>>> test.tmp write 1572864 262144 >>>> test.tmp write 1835008 262144 >>>> test.tmp close >>>> >>>> # for each in {0..7}; do dd if=test.tmp bs=262144 count=1 skip=$each >>>> 2>/dev/null | hexdump -C | md5sum; done >>>> 71a1660503bcff7c4e20a763d569d069 - >>>> 9c9bb7ec1020b4d4249028aecc896e6b - >>>> 68b9685812d47c822532854201c9b352 - >>>> e5c8ef471a27ba92b86893ee5ded654b - >>>> 14e0e798a8af3f4e6abdaf022ddf91c3 - >>>> 85528ae970bd25dde8c39ecaaffa4cf3 - >>>> 60b8ccf0e0793094b9356544fb541f3a - >>>> ef736cc9cbf7588cb7b84467cb37c44e - >>>> >>>> # fio -v >>>> fio-2.2.7-24-g7c30 >>> >>> >>> Can you try with current -git? The corner cases of being 100% dedupable was >>> broken. >>> >>> -- >>> Jens Axboe >>> -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html