Hi, > > > Yeah, but even real disk could return bogus data (that is, silent data > > > corruption). So this issue (returning bogus data) is about the > > > possibility. > > > > The modern disks and HBAs can detect bogus data in most cases, but > > You are talking about SCSI DIF or high-end storage systems that use > checksumming internally? > > I'm not sure the modern SATA disk can detect such failure. I think the modern SATA disk has this feature while the IDE disk doesn't have. > > there are still possibilities. Yes. > > > > > In addition, as you know, the recent file systems can handle such > > > failure. > > > > Yes, I know some filesystem got such a feature. But there is no point > > to return bogus data instead of an EIO error. > > Yeah, but returning EIO in such cases makes an implementation more > complicated. The target of VastSky is to emulate a regular block device. If there's no valid data, it just returns EIO. The way of handling EIO is left to filesystems. But it is just a policy. I won't blame if there are storage systems that choose other policies. > > > > VastSky updates all the mirrors synchronously. And only after all > > > > the I/O requests are completed, it tells the owner that the request > > > > is done. > > > > > > Undoerstood. Sheepdog works in the same way. > > > > > > How does Vastsky detect old data? > > > > > > If one of the mirror nodes is down (e.g. the node is too busy), > > > Vastsky assigns a new node? > > > > Right. > > Vastsky makes the node that seems to be down deleted from the group > > and assigns a new one. Then, no one can access to the old one after that. > > How Vastsky stores the information of the group? Actually Vastsky has a meta data node. When the structure of a logical volume has changed, the meta data related to the volume on the meta data node is updated. The meta data is cached at the node that uses the volume. > For example, Vastsky > assigns a new node, updates the data on all the replica nodes, and > returns the success to the client, right after that, all nodes are > down due to a power failure. After all the nodes boot up again, > Vastsky can still detect the old data? I guess Vastsky can re-synchronize them again. But I'm not really sure how it works. I have to ask the implementor in our team. > > > Then if a client issues READ and all the > > > mirrors are down but the old mirror node is alive, how does Vastsky > > > prevent returning the old data? > > > > About this issue, we have a plan: > > When a node is down and it's not because of a hardware error, > > we will make VastSky try to re-synchronize the node again. > > Yeah, that's necessary especially each nodes has huge data. Sheepdog > can do that. > > > This will be done in a few minutes because VastSky traces all write > > I/O requests to know which sectors of the node aren't synchronized. > > How Vastsky stores the trace log safely (I guess that the trace log is > saved on multiple hosts). Your assumption is correct. > Vastsky updates the log per WRITE request? Vastsky updates the log if this is the first write to a certain section including the sectors the request has to perform. The log is cleared periodically if the section is synchronized between the nodes. Actually, I have to say Vastsky uses the linux md driver to achieve this, which already have this feature. > > And you should know VastSky won't easily give up a node which seems > > to be down. VastSky tries to reconnect the session and even tries to > > use another path to access the node. > > Hmm, but it just means that a client I/O request takes long. Even if > VastSky doesn't give up, a client (i.e. application) doesn't want to > wait for long. Some applications don't like this behavior, but this is the policy of Vastsky. I think this behavior is the same as using FC-SAN storages. Thank you, Hirokazu Takahashi. -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html