On Mon, Jul 29, 2019 at 11:43 AM Dongsheng Yang <dongsheng.yang@xxxxxxxxxxxx> wrote: > > Hi Ilya, Jason and all: > As new exclusive-lock is merged, I think we can start this work now. > This is V3, which is rebased against 5.3-rc1. > > Testing: > kernel branch: https://github.com/yangdongsheng/linux/tree/krbd_journaling_v3 > ceph qa branch: https://github.com/yangdongsheng/ceph/tree/krbd_mirror_qa > > (1). A new test added: workunits/rbd/kernel_journal.sh: to test the journal replaying in krbd. > (2). A new test added: qa/suites/krbd/mirror/, this test krbd journaling with rbd-mirror daemon. > > Performance: > compared with librbd journaling, preformance of krbd journaling looks reasonable. > ------------------------------------------------------------------------------------- > (1) rbd bench with journaling disabled: | IOPS: 114 > ------------------------------------------------------------------------------------- > (2) rbd bench with journaling enabled: | IOPS: 55 > ------------------------------------------------------------------------------------- > (3) fio krbd with journaling disabled: | IOPS: 118 > ------------------------------------------------------------------------------------- > (4) fio krbd with journaling enabled: | IOPS: 57 > ------------------------------------------------------------------------------------- > > TODO: > (1). there are some TODOs in comments, such as supporting rbd_journal_max_concurrent_object_sets. > (2). add debugfs for generic journaling. > > I would like to put this TODO work in next cycle, but focus on making the current work ready to go. > > Changelog: > -V2 > 1. support large event (> 4M) > 2. fix bug in replay in different clients appending > 3. rebase against 5.3-rc1 > 4. refactor journaler appending into state machine > -V1 > 1. add test case in qa > 2. address all memleak found in kmemleak > 3. several bug fixes > 4. performance improvement. > -RFC > 1. error out if there is some unsupported event type in replaying > 2. just one memory copy from bio to msg. > 3. use async IO in journal appending. > 4. no mutex around IO. > > Any comments are welcome!! > > Dongsheng Yang (15): > libceph: introduce ceph_extract_encoded_string_kvmalloc > libceph: introduce a new parameter of workqueue in ceph_osdc_watch() > libceph: support op append > libceph: add prefix and suffix in ceph_osd_req_op.extent > libceph: introduce cls_journaler_client > libceph: introduce generic journaling > libceph: journaling: introduce api to replay uncommitted journal > events > libceph: journaling: introduce api for journal appending > libceph: journaling: trim object set when we found there is no client > refer it > rbd: introduce completion for each img_request > rbd: introduce IMG_REQ_NOLOCK flag for image request state > rbd: introduce rbd_journal_allocate_tag to allocate journal tag for > rbd client > rbd: append journal event in image request state machine > rbd: replay events in journal > rbd: add support for feature of RBD_FEATURE_JOURNALING > > drivers/block/rbd.c | 600 +++++++- > include/linux/ceph/cls_journaler_client.h | 94 ++ > include/linux/ceph/decode.h | 21 +- > include/linux/ceph/journaler.h | 184 +++ > include/linux/ceph/osd_client.h | 19 + > net/ceph/Makefile | 3 +- > net/ceph/cls_journaler_client.c | 558 ++++++++ > net/ceph/journaler.c | 2231 +++++++++++++++++++++++++++++ > net/ceph/osd_client.c | 61 +- > 9 files changed, 3759 insertions(+), 12 deletions(-) > create mode 100644 include/linux/ceph/cls_journaler_client.h > create mode 100644 include/linux/ceph/journaler.h > create mode 100644 net/ceph/cls_journaler_client.c > create mode 100644 net/ceph/journaler.c Hi Dongsheng, Some general comments that apply to the whole series: - comments should look like /* comment */ /* * multi-line * comment */ - placement of braces: a) don't use braces around single statements (everywhere) and b) functions should have the opening brace on the next line (e.g. rbd_img_need_journal()) - 80 column limit: we aren't very strict about it, but overly long lines should be the exception, not the rule - unnecessary forward declarations: just place the new function above the call-site - sizeof(struct foo) should be sizeof(*foo_ptr) - integer types: use u{8,16,32,64} instead of uint{8,16,32,64}_t - static const variables should be defines (e.g. PREAMBLE) - no need to initialize fields to 0, NULL, etc after kzalloc() or similar (e.g. ceph_journaler_open()) Many of these rules are in Documentation/process/coding-style.rst. Lastly, I would drop replaying for now. This is a large series and replaying amounts to at least a quarter of it without actually solving the problem in its entirety. Let's try to get appending and trimming in shape first. Thanks, Ilya