Re: monitor exclusive lock when rbd client died abruptly

Ilya Dryomov <idryomov@xxxxxxxxx> · Wed, 22 Jul 2020 13:44:58 +0200

On Wed, Jul 22, 2020 at 12:34 PM Liu, Changcheng
<changcheng.liu@xxxxxxxxx> wrote:
>
> On 11:10 Wed 22 Jul, Ilya Dryomov wrote:
> > On Wed, Jul 22, 2020 at 7:46 AM Liu, Changcheng
> > <changcheng.liu@xxxxxxxxx> wrote:
> > >
> > > Hi all,
> > >    I've checked below document:
> > >    https://docs.ceph.com/docs/master/rbd/rbd-exclusive-locks/
> > >
> > >    The content is contradictory to the result shown in below experiment:
> > >    The experient shows that another process could still write data to rbd
> > >    volume while there already has a process write to the same rbd volume
> > >    continuously.
> >
> > This is the expected behaviour.  Exclusive lock is a cooperative
> > mechanism that ensures that only a single client is able to write
> > to the image and update its metadata (such as the object map)
> > at any given moment, not until the client exits.  It is acquired
> > automatically and the ownership is transparently transitioned between
> > clients.  In your example, "second" wakes up and requests the lock
> > from "first", "first" releases it, "second" performs its write,
> > "first" reacquires the lock and goes on.
> >
> > If you want to disable transparent lock transitions, you need to
> > acquire the lock manually with RBD_LOCK_MODE_EXCLUSIVE:
> @Ilya:
>     Thanks for your info. The transparent lock transition could be
>     disabled by acquring the lock with RBD_LOCK_MODE_EXCLUSIVE.
>
>     After "first" process acquire the lock with "RBD_LOCK_MODE_EXCLUSIVE",
>     is it possible for another process to be notified that the lock is
>     released whatever the "first" process exit gracefully or be killed?

No.  Theoretically, "second" could block, waiting for the lock
to be released by "first" (whether gracefully or not), but I don't
think librbd does that.  (And if it did, it would have been based
on periodic retries, not notifies, because if the process is killed
there is nowhere for that notify to come from.)

>
>     I write below program to run "another process". If I manully remove
>     the lock, "another process" could be notified. However, if the
>     "first" process exit gracefully or be killed, "another process"'s
>     handle_notify won't be called at all.

If you are going to use exclusive lock API, you shouldn't be
poking at the underlying watches and notifies.

>         1 #include <rbd/librbd.hpp>
>         2 #include <rados/librados.hpp>
>         3
>         4 #include <cstring>
>         5 #include <iostream>
>         6 #include <string>
>         7
>         8 class TestWatcher {
>         9 public:
>        10     librados::Rados rados;
>        11     librbd::RBD rbd;
>        12     librbd::Image image;
>        13     librados::IoCtx io_ctx;
>        14
>        15     std::string pool_name;
>        16     std::string image_name;
>        17
>        18     TestWatcher(std::string pool_name = "rbd",
>        19                 std::string image_name = "fio_test")
>        20         : pool_name(pool_name), image_name(image_name) {
>        21         int ret = rados.init("admin");
>        22         if (ret < 0) {
>        23             std::cout << "failed to initialize rados" << std::endl;
>        24             exit(1);
>        25         }
>        26
>        27         ret = rados.conf_read_file("ceph.conf");
>        28         if (ret < 0) {
>        29             std::cout << "failed to parse ceph.conf" << std::endl;
>        30             exit(1);
>        31         }
>        32
>        33         ret = rados.connect();
>        34         if (ret < 0) {
>        35             std::cout << "failed to connect to rados cluster" << std::endl;
>        36             exit(1);
>        37         }
>        38
>        39         ret = rados.ioctx_create(pool_name.c_str(), io_ctx);
>        40         if (ret < 0) {
>        41             rados.shutdown();
>        42             std::cout << "failed to create ioctx" << std::endl;
>        43             exit(1);
>        44         }
>        45
>        46         ret = rbd.open(io_ctx, image, image_name.c_str());
>        47         if (ret < 0) {
>        48             io_ctx.close();
>        49             rados.shutdown();
>        50             std::cout << "failed to open rbd image" << std::endl;
>        51             exit(1);
>        52         } else {
>        53             std::cout << "open image succeeded" << std::endl;
>        54         }
>        55     }
>        56
>        57     ~TestWatcher() {
>        58         image.close();
>        59         io_ctx.close();
>        60         rados.shutdown();
>        61
>        62         if (watch_ctx != nullptr) {
>        63             delete watch_ctx;
>        64             watch_ctx = nullptr;
>        65         }
>        66     }
>        67
>        68     class WatchCtx: public librbd::UpdateWatchCtx {
>        69     private:
>        70         TestWatcher &_test_watcher;
>        71     public:
>        72         explicit WatchCtx(TestWatcher &test_watcher):
>        73             _test_watcher(test_watcher) {
>        74         }
>        75
>        76         int list_watchers() {
>        77             std::list<librbd::image_watcher_t> watcher_list;
>        78             int r = _test_watcher.image.list_watchers(watcher_list);
>        79             if (r >= 0) {
>        80                 for (auto it = watcher_list.cbegin(); it != watcher_list.cend(); ++it) {
>        81                     std::cout << "addr: " << it->addr.c_str() << ", "
>        82                               << "id: " << it->id << ", "
>        83                               << "cookie: " << it->cookie << std::endl;
>        84                 }
>        85             }
>        86             return r;
>        87         }
>        88
>        89         int list_lockers() {
>        90             std::list<librbd::locker_t> lockers;
>        91             std::string tag;
>        92             bool exclusive;
>        93             int r = _test_watcher.image.list_lockers(&lockers, &exclusive, &tag);
>        94             if (r >= 0) {
>        95               for (auto it = lockers.cbegin(); it != lockers.cend(); ++it) {
>        96                     std::cout << "client: " << it->client.c_str() << ", "
>        97                               << "cookie: " << it->cookie.c_str() << ", "
>        98                               << "address: " << it->address.c_str() << std::endl;
>        99               }
>       100             }
>       101             return r;
>       102         }
>       103         void handle_notify() override {
>       104             std::cout << "event comming" << std::endl;
>       105         }
>       106     };
>       107
>       108     int list_watchers() {
>       109         watch_ctx = new WatchCtx(*this);
>       110         int r = watch_ctx->list_watchers();
>       111         delete watch_ctx;
>       112         watch_ctx = nullptr;
>       113         return r;
>       114     }
>       115
>       116     int list_lockers() {
>       117         watch_ctx = new WatchCtx(*this);
>       118         int r = watch_ctx->list_lockers();
>       119         delete watch_ctx;
>       120         watch_ctx = nullptr;
>       121         return r;
>       122     }
>       123
>       124
>       125     int update_watch(uint64_t *phandle) {
>       126         watch_ctx = new WatchCtx(*this);
>       127         return image.update_watch(watch_ctx, phandle);
>       128     }
>       129     WatchCtx* watch_ctx = nullptr;
>       130 };
>       131
>       132 int main(void) {
>       133     TestWatcher testwatcher;
>       134     testwatcher.list_watchers();
>       135     testwatcher.list_lockers();
>       136     uint64_t handle = 0;
>       137     testwatcher.update_watch(&handle);
>       138     while(1);
>       139 }

Thanks,

                Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx