Re: Query regarding dictionary logic

Vijay Bellur <vbellur@xxxxxxxxxx> · Thu, 2 May 2019 22:44:39 -0700

Hi Mohit,
Thank you for the update. More inline.

On Wed, May 1, 2019 at 11:45 PM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Hi Vijay,

I have tried to execute smallfile tool on volume(12x3), i have not found any significant performance improvement
for smallfile operations, I have configured 4 clients and 8 thread to run operations. 

For measuring performance, did you measure both time taken and cpu consumed? Normally O(n) computations are cpu expensive and we might see better results with a hash table when a large number of objects ( a few thousands) are present in a single dictionary. If you haven't gathered cpu statistics, please also gather that for comparison.

I have generated statedump and found below data for dictionaries specific to gluster processes

brick
max-pairs-per-dict=50
total-pairs-used=192212171
total-dicts-used=24794349
average-pairs-per-dict=7

glusterd
max-pairs-per-dict=301
total-pairs-used=156677
total-dicts-used=30719
average-pairs-per-dict=5

fuse process
[dict]
max-pairs-per-dict=50
total-pairs-used=88669561
total-dicts-used=12360543
average-pairs-per-dict=7

It seems dictionary has max-pairs in case of glusterd and while no. of volumes are high the number can be increased.
I think there is no performance regression in case of brick and fuse. I have used hash_size 20 for the dictionary.
Let me know if you can provide some other test to validate the same.

A few more items to try out:

1. Vary the number of buckets and test.
2. Create about 10000 volumes and measure performance for a volume info <volname> operation on some random volume? 
3. Check the related patch from Facebook and see if we can incorporate any ideas from their patch.

Thanks,
Vijay

Thanks,
Mohit Agrawal

On Tue, Apr 30, 2019 at 2:29 PM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Thanks, Amar for sharing the patch, I will test and share the result.

On Tue, Apr 30, 2019 at 2:23 PM Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx> wrote:
Shreyas/Kevin tried to address it some time back using https://bugzilla.redhat.com/show_bug.cgi?id=1428049 (https://review.gluster.org/16830)

I vaguely remember the reason to keep the hash value 1 was done during the time when we had dictionary itself sent as on wire protocol, and in most other places, number of entries in dictionary was on an avg, 3. So, we felt, saving on a bit of memory for optimization was better at that time.

-Amar

On Tue, Apr 30, 2019 at 12:02 PM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
sure Vijay, I will try and update.
Regards,
Mohit Agrawal

On Tue, Apr 30, 2019 at 11:44 AM Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
Hi Mohit,

On Mon, Apr 29, 2019 at 7:15 AM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Hi All,

  I was just looking at the code of dict, I have one query current dictionary logic.
  I am not able to understand why we use hash_size is 1 for a dictionary.IMO with the 
  hash_size of 1 dictionary always work like a list, not a hash, for every lookup
  in dictionary complexity is O(n).

  Before optimizing the code I just want to know what was the exact reason to define
  hash_size is 1?

This is a good question. I looked up the source in gluster's historic repo [1] and hash_size is 1 even there. So, this could have been the case since the first version of the dictionary code.

Would you be able to run some tests with a larger hash_size and share your observations?

Thanks,
Vijay

[1] https://github.com/gluster/historic/blob/master/libglusterfs/src/dict.c

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel