Distribute rados locks across several RGWs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Our team have been working on: https://github.com/ceph/ceph/pull/45958
The main idea behind this solution is to allow several RGWs to fairly distribute locks for multisite syncing.
The way the rgw will do it:
  1. Every RGW will create a vector of the size of the lock count used by the RGW, in every cell it will store as a value the index of the cell and call std::shuffle on the vector.
  2. Each time RGW tries to lock a lock it will call the lock and add a bid number which is vector[shard_id] and expiration time for the bid.
  3. Once the CLS_LOCK is called, the function should know which RGW has the lowest bid and the bid has not expired for the specific lock, if the RGW that called CLS_LOCK doesn't have the lowest bid it will fail(it will renew if it already acquired the lock) otherwise it will acquire the lock.
  4. Using this method allows the RGWs to share locks almost perfectly between each other so the work could be done by several RGWs.
Currently, to maintain the bid mapping for all bidded locks, which means, for every lock know how many clients tried to lock and their non-expired bids, we use a static std::unordered_map and a static mutex inside lock_obj function, this is the current implementation, the reason for that is that generally we don't need the map to persistent between restarts or changing the Primary or the OSD, but between calls it should be persistent.

We thought about other ways to maintain that information:
  1. For each lock_info_t maintain a smaller map that will include only clients and clients' bids.
    The cons with this solution were that we will need to write the xattr(write_lock()) if we change the bids, which happens every call and the second one is that for every non-zero returned value the write_lock() would not happen, so it is not updating the map at all.
  2. We could maybe use ObjectContext and store the bid info for each lock there, but ObjectContext are not staying in memory for a long time.
What do you think could be the best way to go?

Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux