Re: [PATCH v2] xfs: introduce object readahead to log recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 30, 2013 at 9:10 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> On 07/30/2013 05:59 AM, zwu.kernel@xxxxxxxxx wrote:
>> From: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>
>>
>>   It can take a long time to run log recovery operation because it is
>> single threaded and is bound by read latency. We can find that it took
>> most of the time to wait for the read IO to occur, so if one object
>> readahead is introduced to log recovery, it will obviously reduce the
>> log recovery time.
>>
>> Log recovery time stat:
>>
>>           w/o this patch        w/ this patch
>>
>> real:        0m15.023s             0m7.802s
>> user:        0m0.001s              0m0.001s
>> sys:         0m0.246s              0m0.107s
>>
>> Signed-off-by: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>
>> ---
>
> Cool patch. I'm not terribly familiar with the log recovery code so take
> my comments with a grain of salt, but a couple things I noticed on a
> quick skim...
>
>>  fs/xfs/xfs_log_recover.c | 162 +++++++++++++++++++++++++++++++++++++++++++++--
>>  fs/xfs/xfs_log_recover.h |   2 +
>>  2 files changed, 159 insertions(+), 5 deletions(-)
>>
> ...
>>
>> +STATIC int
>> +xlog_recover_items_pass2(
>> +     struct xlog                     *log,
>> +     struct xlog_recover             *trans,
>> +     struct list_head                *buffer_list,
>> +     struct list_head                *ra_list)
>
> A nit, but technically this function doesn't have to be involved with
> readahead. Perhaps rename ra_list to item_list..?
ok, applied.
>
>> +{
>> +     int                     error = 0;
>> +     xlog_recover_item_t     *item;
>> +
>> +     list_for_each_entry(item, ra_list, ri_list) {
>> +             error = xlog_recover_commit_pass2(log, trans,
>> +                                       buffer_list, item);
>> +             if (error)
>> +                     return error;
>> +     }
>> +
>> +     return error;
>> +}
>> +
>>  /*
>>   * Perform the transaction.
>>   *
>> @@ -3189,9 +3314,11 @@ xlog_recover_commit_trans(
>>       struct xlog_recover     *trans,
>>       int                     pass)
>>  {
>> -     int                     error = 0, error2;
>> -     xlog_recover_item_t     *item;
>> +     int                     error = 0, error2, ra_qdepth = 0;
>> +     xlog_recover_item_t     *item, *next;
>>       LIST_HEAD               (buffer_list);
>> +     LIST_HEAD               (ra_list);
>> +     LIST_HEAD               (all_ra_list);
>>
>>       hlist_del(&trans->r_list);
>>
>> @@ -3199,14 +3326,21 @@ xlog_recover_commit_trans(
>>       if (error)
>>               return error;
>>
>> -     list_for_each_entry(item, &trans->r_itemq, ri_list) {
>> +     list_for_each_entry_safe(item, next, &trans->r_itemq, ri_list) {
>>               switch (pass) {
>>               case XLOG_RECOVER_PASS1:
>>                       error = xlog_recover_commit_pass1(log, trans, item);
>>                       break;
>>               case XLOG_RECOVER_PASS2:
>> -                     error = xlog_recover_commit_pass2(log, trans,
>> -                                                       &buffer_list, item);
>> +                     if (ra_qdepth++ >= XLOG_RECOVER_MAX_QDEPTH) {
>
> The counting mechanism looks strange and easy to break with future
> changes. Why not increment ra_qdepth in the else bracket where it is
> explicitly tied to the operation it tracks?
ok.
>
>> +                             error = xlog_recover_items_pass2(log, trans,
>> +                                             &buffer_list, &ra_list);
>> +                             list_splice_tail_init(&ra_list, &all_ra_list);
>> +                             ra_qdepth = 0;
>
> So now we've queued up a bunch of items we've issued readahead on in
> ra_list and we've executed the recovery on the list. What happens to the
> current item?
Good catch, the current item was missed. Done.
>
> Brian
>
>> +                     } else {
>> +                             xlog_recover_ra_pass2(log, item);
>> +                             list_move_tail(&item->ri_list, &ra_list);
>> +                     }
>>                       break;
>>               default:
>>                       ASSERT(0);
>> @@ -3216,9 +3350,27 @@ xlog_recover_commit_trans(
>>                       goto out;
>>       }
>>
>> +     if (!list_empty(&ra_list)) {
>> +             error = xlog_recover_items_pass2(log, trans,
>> +                             &buffer_list, &ra_list);
>> +             if (error)
>> +                     goto out;
>> +
>> +             list_splice_tail_init(&ra_list, &all_ra_list);
>> +     }
>> +
>> +     if (!list_empty(&all_ra_list))
>> +             list_splice_init(&all_ra_list, &trans->r_itemq);
>> +
>>       xlog_recover_free_trans(trans);
>>
>>  out:
>> +     if (!list_empty(&ra_list))
>> +             list_splice_tail_init(&ra_list, &all_ra_list);
>> +
>> +     if (!list_empty(&all_ra_list))
>> +             list_splice_init(&all_ra_list, &trans->r_itemq);
>> +
>>       error2 = xfs_buf_delwri_submit(&buffer_list);
>>       return error ? error : error2;
>>  }
>> diff --git a/fs/xfs/xfs_log_recover.h b/fs/xfs/xfs_log_recover.h
>> index 1c55ccb..16322f6 100644
>> --- a/fs/xfs/xfs_log_recover.h
>> +++ b/fs/xfs/xfs_log_recover.h
>> @@ -63,4 +63,6 @@ typedef struct xlog_recover {
>>  #define      XLOG_RECOVER_PASS1      1
>>  #define      XLOG_RECOVER_PASS2      2
>>
>> +#define XLOG_RECOVER_MAX_QDEPTH 100
>> +
>>  #endif       /* __XFS_LOG_RECOVER_H__ */
>>
>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux