Re: [PATCH v3 00/15] rbd journaling feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 08/19/2019 03:28 AM, Ilya Dryomov wrote:
On Mon, Jul 29, 2019 at 11:43 AM Dongsheng Yang
<dongsheng.yang@xxxxxxxxxxxx>  wrote:
Hi Ilya, Jason and all:
         As new exclusive-lock is merged, I think we can start this work now.
This is V3, which is rebased against 5.3-rc1.

Testing:
         kernel branch:https://github.com/yangdongsheng/linux/tree/krbd_journaling_v3
         ceph qa branch:https://github.com/yangdongsheng/ceph/tree/krbd_mirror_qa

         (1). A new test added: workunits/rbd/kernel_journal.sh: to test the journal replaying in krbd.
         (2). A new test added: qa/suites/krbd/mirror/, this test krbd journaling with rbd-mirror daemon.

Performance:
         compared with librbd journaling, preformance of krbd journaling looks reasonable.
         -------------------------------------------------------------------------------------
         (1) rbd bench with journaling disabled:         |       IOPS: 114
         -------------------------------------------------------------------------------------
         (2) rbd bench with journaling enabled:          |       IOPS: 55
         -------------------------------------------------------------------------------------
         (3) fio krbd with journaling disabled:          |       IOPS: 118
         -------------------------------------------------------------------------------------
         (4) fio krbd with journaling enabled:           |       IOPS: 57
         -------------------------------------------------------------------------------------

TODO:
         (1). there are some TODOs in comments, such as supporting rbd_journal_max_concurrent_object_sets.
         (2). add debugfs for generic journaling.

         I would like to put this TODO work in next cycle, but focus on making  the current work ready to go.

Changelog:
         -V2
                 1. support large event (> 4M)
                 2. fix bug in replay in different clients appending
                 3. rebase against 5.3-rc1
                 4. refactor journaler appending into state machine
         -V1
                 1. add test case in qa
                 2. address all memleak found in kmemleak
                 3. several bug fixes
                 4. performance improvement.
         -RFC
                 1. error out if there is some unsupported event type in replaying
                 2. just one memory copy from bio to msg.
                 3. use async IO in journal appending.
                 4. no mutex around IO.

Any comments are welcome!!

Dongsheng Yang (15):
   libceph: introduce ceph_extract_encoded_string_kvmalloc
   libceph: introduce a new parameter of workqueue in ceph_osdc_watch()
   libceph: support op append
   libceph: add prefix and suffix in ceph_osd_req_op.extent
   libceph: introduce cls_journaler_client
   libceph: introduce generic journaling
   libceph: journaling: introduce api to replay uncommitted journal
     events
   libceph: journaling: introduce api for journal appending
   libceph: journaling: trim object set when we found there is no client
     refer it
   rbd: introduce completion for each img_request
   rbd: introduce IMG_REQ_NOLOCK flag for image request state
   rbd: introduce rbd_journal_allocate_tag to allocate journal tag for
     rbd client
   rbd: append journal event in image request state machine
   rbd: replay events in journal
   rbd: add support for feature of RBD_FEATURE_JOURNALING

  drivers/block/rbd.c                       |  600 +++++++-
  include/linux/ceph/cls_journaler_client.h |   94 ++
  include/linux/ceph/decode.h               |   21 +-
  include/linux/ceph/journaler.h            |  184 +++
  include/linux/ceph/osd_client.h           |   19 +
  net/ceph/Makefile                         |    3 +-
  net/ceph/cls_journaler_client.c           |  558 ++++++++
  net/ceph/journaler.c                      | 2231 +++++++++++++++++++++++++++++
  net/ceph/osd_client.c                     |   61 +-
  9 files changed, 3759 insertions(+), 12 deletions(-)
  create mode 100644 include/linux/ceph/cls_journaler_client.h
  create mode 100644 include/linux/ceph/journaler.h
  create mode 100644 net/ceph/cls_journaler_client.c
  create mode 100644 net/ceph/journaler.c
Hi Dongsheng,

Some general comments that apply to the whole series:

- comments should look like

   /* comment */

   /*
    * multi-line
    * comment
    */

- placement of braces: a) don't use braces around single statements
   (everywhere) and b) functions should have the opening brace on the
   next line (e.g. rbd_img_need_journal())

- 80 column limit: we aren't very strict about it, but overly long
   lines should be the exception, not the rule

- unnecessary forward declarations: just place the new function above
   the call-site

- sizeof(struct foo) should be sizeof(*foo_ptr)

- integer types: use u{8,16,32,64} instead of uint{8,16,32,64}_t

- static const variables should be defines (e.g. PREAMBLE)

- no need to initialize fields to 0, NULL, etc after kzalloc() or
   similar (e.g. ceph_journaler_open())

Many of these rules are in Documentation/process/coding-style.rst.

Hi Ilya,
Thanx for your suggestion about coding-style above, I will check my code again.
Lastly, I would drop replaying for now.  This is a large series and
replaying amounts to at least a quarter of it without actually solving
the problem in its entirety.  Let's try to get appending and trimming
in shape first.

I would like to keep replaying:
(1) The current replaying at least cover the all events generated from kernel rbd. So this feature looks self-consistent
I think.

That means, if you are only using krbd in your usecase, krbd journaling would works well. But if we drop replaying, I am afraid we can't use journaling at all even when we are going to use krbd only.

(2) Even if we don't do replaying, we still have to do journal fetching and preprocess, to
know is there any uncommitted journal event. That's still a not small work.

(3) About solving the problem entirely, I actually had a plan in my mind.
we can provide an map option to user,
maybe "rbd map IMAGE -o journal-replay-helper=/usr/bin/rbd-journal-replay-helper.sh"

If this option is specified, we will use call_usermodehelper() to call the helper specified to do replay in journal opening.

else, we can use default replaying which is provided in this patch set.

Thanx
Thanks,

                 Ilya






[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux