Re: [PATCH bpf] bpf: improve htab_map_get_next_key behaviour during races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/18/19 9:03 AM, Alexei Starovoitov wrote:
> On Fri, Oct 18, 2019 at 02:43:11PM +0100, Lorenz Bauer wrote:
>> To iterate a BPF map, userspace must use MAP_GET_NEXT_KEY and provide
>> the last retrieved key. The code then scans the hash table bucket
>> for the key and returns the key of the next item.
>>
>> This presents a problem if the last retrieved key isn't present in the
>> hash table anymore, e.g. due to concurrent deletion. It's not possible
>> to ascertain the location of a key in a given bucket, so there isn't
>> really a correct answer. The implementation currently returns the
>> first key in the first bucket. This guarantees that we never skip an
>> existing key. However, it means that a user space program iterating
>> a heavily modified map may never reach the end of the hash table,
>> forever restarting at the beginning.
>>
>> Fixing this outright is rather involved. However, we can improve slightly
>> by never revisiting earlier buckets. Instead of the first key in the
>> first bucket we return the first key in the "current" bucket. This
>> doesn't eliminate the problem, but makes it less likely to occur.
>>
>> Prior to commit 8fe45924387b ("bpf: map_get_next_key to return first key on NULL")
>> passing a non-existent key to MAP_GET_NEXT_KEY was the only way to
>> find the first key. Hence there is a small chance that there is code that
>> will be broken by this change.
> 
> It is 100% chance that it will break older bcc tools that were written
> before NULL was possible argument for get_next_key.

The referenced bcc code is in below:
https://github.com/iovisor/bcc/blob/master/src/cc/libbpf.c#L301-L330

> Please see Yonghong's patches for batched map lookup.

This is my RFC patch.
https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@xxxxxx/T/#t

I have not got time to finish it as a proper patch set yet.
But hopefully soon can find some time to work on this.


> That's the proper way to solve your problem.
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux