On 11/10/20 12:31 PM, Frantisek Zatloukal wrote:
Nevermind, I wouldn't have been able to sleep before diagnosing it :) .However, since we're running on a pretty exhausted and overloaded temporary server, OOM sometimes kills some of our workers or even the redis/database. In these cases, when we lose our tasks queue, but counters stay incremented, we would end up not syncing these items ever (with incremented counters but not in the queue).To workaround this and further increase reliability, we have implemented a watchdog. It simply goes throughout the cache for any items which weren't synced when they should've been and resets the counter for them. It runs once a day.
Thanks for the detailed response and quick fix! If it helps any: you can use keyspace notifications to track eviction from memory. You might be able to work with that and maxmemory to avoid oom more.
Attachment:
OpenPGP_0x25CF7F7DE2FF5AA9.asc
Description: application/pgp-keys
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx