On Fri, Nov 18, 2016 at 11:01:07AM +1100, Neil Brown wrote: > On Fri, Nov 18 2016, Shaohua Li wrote: > > > On Thu, Nov 17, 2016 at 04:18:15PM +1100, Neil Brown wrote: > >> On Thu, Nov 17 2016, Shaohua Li wrote: > >> > >> > Currently raid5-cache update superblock in .quiesce. But since at > >> > shutdown/reboot, .quiesce is called with reconfig mutex locked, > >> > superblock isn't guaranteed to be called in reclaim thread (see > >> > 8e018c21da3). This will make assemble do unnecessary journal recovery. > >> > It doesn't corrupt data but is annoying. This adds an extra hook to > >> > guarantee journal is flushed to raid disks. And since this hook is > >> > called before superblock update, this will guarantee we have a uptodate > >> > superblock in shutdown/reboot > >> > >> Hi. > >> I don't quite follow some of the reasoning here. > >> In particular, the ->stop_writes() that you have implemented > >> does almost exactly the same thing as r5l_quiesce(1). > >> So why not simply call ->quiesce(mddev, 1) in __md_stop_writes()?? > >> You probably need to also call ->quiesce(mddev, 0) to keep things > >> balanced. > > > > reboot (md_notify_reboot) doesn't call .quiesce, maybe we should do though. And > > in stop, we hold reconfig_mutex before calling .quiesce. And with commit > > 8e018c21da3, r5l_write_super_and_discard_space tries to hold the reconfig_mutex > > before write super, which it can't hold, so superblock write is skipped. After > > .quiesce we don't write superblock. To fix the shutdown case, we can add a > > superblock write after .quiesce. But I think it's more generic to add a > > ->stop_writes since it will work for the reboot case. > > I hadn't quite processed that this was about md_notify_reboot(). > I would be very wary of optimizing this code. It should certainly avoid > data loss, but anything more doesn't belong here. > During a clean shutdown the array should be stopped properly. > md_notify_reboot() is only meant for minimizing damaged caused by a > hasty "reboot -f -n". yep, this isn't the priority. So do you still suggest we ignore the reboot case and add the journal flush after .quiesce() is called in stop? > A "clean" shutdown currently includes systemd/mdadm.shutdown (in the > mdadm package) running "mdadm --wait-clean --scan". > "mdadm --wait-clean" changes the "safe_mode_delay" so that the array > will become "clean" more quickly. > Possibly we should add something to that to trigger a flush of the > journal, and to wait for the flush to complete. I'm not sure how we can do this. But adding a different state in 'array_state' indicating journal isn't clean makes sense. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html