Re: Query regarding dictionary logic

Mohit Agrawal <moagrawa@xxxxxxxxxx> · Thu, 2 May 2019 12:15:09 +0530

Hi Vijay,

I have tried to execute smallfile tool on volume(12x3), i have not found any significant performance improvement
for smallfile operations, I have configured 4 clients and 8 thread to run operations. 

I have generated statedump and found below data for dictionaries specific to gluster processes

brick
max-pairs-per-dict=50
total-pairs-used=192212171
total-dicts-used=24794349
average-pairs-per-dict=7

glusterd
max-pairs-per-dict=301
total-pairs-used=156677
total-dicts-used=30719
average-pairs-per-dict=5

fuse process
[dict]
max-pairs-per-dict=50
total-pairs-used=88669561
total-dicts-used=12360543
average-pairs-per-dict=7

It seems dictionary has max-pairs in case of glusterd and while no. of volumes are high the number can be increased.
I think there is no performance regression in case of brick and fuse. I have used hash_size 20 for the dictionary.
Let me know if you can provide some other test to validate the same.

Thanks,
Mohit Agrawal

On Tue, Apr 30, 2019 at 2:29 PM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Thanks, Amar for sharing the patch, I will test and share the result.

On Tue, Apr 30, 2019 at 2:23 PM Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx> wrote:
Shreyas/Kevin tried to address it some time back using https://bugzilla.redhat.com/show_bug.cgi?id=1428049 (https://review.gluster.org/16830)

I vaguely remember the reason to keep the hash value 1 was done during the time when we had dictionary itself sent as on wire protocol, and in most other places, number of entries in dictionary was on an avg, 3. So, we felt, saving on a bit of memory for optimization was better at that time.

-Amar

On Tue, Apr 30, 2019 at 12:02 PM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
sure Vijay, I will try and update.
Regards,
Mohit Agrawal

On Tue, Apr 30, 2019 at 11:44 AM Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
Hi Mohit,

On Mon, Apr 29, 2019 at 7:15 AM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Hi All,

  I was just looking at the code of dict, I have one query current dictionary logic.
  I am not able to understand why we use hash_size is 1 for a dictionary.IMO with the 
  hash_size of 1 dictionary always work like a list, not a hash, for every lookup
  in dictionary complexity is O(n).

  Before optimizing the code I just want to know what was the exact reason to define
  hash_size is 1?

This is a good question. I looked up the source in gluster's historic repo [1] and hash_size is 1 even there. So, this could have been the case since the first version of the dictionary code.

Would you be able to run some tests with a larger hash_size and share your observations?

Thanks,
Vijay

[1] https://github.com/gluster/historic/blob/master/libglusterfs/src/dict.c

  Please share your view on the same. 

Thanks,
Mohit Agrawal  
_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-devel

-- 
Amar Tumballi (amarts)

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel