After publishing the message "propose some information retrieval protocols for Internet", we received many advices. Now we want to build a new work group for this issue, asking for more advices.
I think it's not clear that this is an appropriate IETF working group that is being proposed:
(a) it seems to be primarily a Web service to find information about Web resources, rather than a more general Internet issue, and there's a debate to be had about whether the IETF is the right forum for this.
(b) the proposal seems to be about defining and retrieving metadata about Web resources, and as such seems to have significant overlap with existing and planned work in W3C [1]. Specifically, the W3C have recently advanced the revised metadata format standard (RDF) and the companion ontology language (OWL) to Proposed Recommendation [2] [3]. They are also in the process of forming a group to define a metadata access service/protocol [4], whose likely function would considerably overlap what I understand your proposal to be doing.
You mention use of Dublin Core as part of your metadata set: I note that there are already RDF specifications for the Dublin Core metadata set [5], and that much existing work with Dublin Core uses RDF.
If you have strong reasons for not following the RDF route, there is also a body of XML Query work that you seem to have overlooked [6].
I also note that I have been unable to access the web page you mention (http://www.lib.hust.edu.cn/dl-lib/English/main.htm).
#g --
[1] http://www.w3.org/
[2] http://www.w3.org/RDF/
[3] http://www.w3.org/2001/sw/WebOnt/
[4] http://www.w3.org/2003/10/RDF-Query-Charter
[5] http://dublincore.org/documents/2002/07/31/dcmes-xml/ http://dublincore.org/documents/2002/04/14/dcq-rdf-xml/
[6] http://www.w3.org/XML/Query
At 15:20 08/01/04 +0800, wang liang wrote:
After publishing the message "propose some information retrieval protocols for Internet", we received many advices. Now we want to build a new work group for this issue, asking for more advices. Information retrieval service may exceed E-mail service and become the most import service of Internet, so we can't neglect it.
The reason to build a work group for public information retrieval protocols lies in the disadvantage of current commercial search engines and the improvement in future public search system.
The faults in commercial search engines.
1 In technology. Now no search engine can cover 60% of all the pages on Internet. The average update interval of their web pages database is almost one month. This is mainly because no of them can close keep up with the explosive web pages on Internet. But the web page is only one kind of information resources. There are still many other resources such as video, special databases, BBS, etc. Could you image single search engine company can efficiently administer all these information resources?
2 In business model. Now many search engine companies are concerned with how to make profit from company users by advertisement and ranking prominence, but never consider what its real customers will feel. Search engine originally is tools for the convenience of Internet customers, but search engine companies have to apply advertisement or selling ranking prominence, somewhat inconvenient to information retrieval, to maintain their subsistence. In other words, search engines make money at the cost of inconvenience of most Internet users, but not its high quality of search service.
3 Except search engine, all the services of Internet such as E-mail, BBS, and FTP are all based on public protocol. There is no secret technology in these services. But the information retrieval service, may be the most important service on Internet, is still dominated by few search engine companies. Many experts know the basic "Pages Ranking" algorithm, but no one know its detail, which is commercial secret. No public surveillance, no real candid ranking algorithm. but We all know another world famous algorithm very well, "money can elevate ranking score". This may not comply with the basic rules of Internet, a public and free world.
4 In any free market, customers should be the God forever, but not few companies.
The improvement in new public search system, DRIS (Domain resources integrated system)
1 In technology. DRIS will build the information retrieval infrastructure of Internet. DRIS applies a hierarchical distributed architecture to manage all the information on Internet, just like DNS. Its main principle is (organization level -conventional database system)-(main sub country Internet level-metadata harvest system)-(country level-distributed search system).In easy words, like web pages system, every DRIS server in bottom level like a university will download and index all the web pages in its local network and then send the metadata to higher layer. All the other resources are also integrated in this method. So DRIS will improve the performance of Internet search engine in recency, coverage and so on.
2 Management. Who will control the DRIS? It's administrated by none of us but every of us. DRIS is managed by its users and coordinated by a public organization, just like management method of DNS. Every organization is its customer and also its builder. It's just the real truth of Internet. DRIS is an opening system, which needn't any profits from its users and of course need not any advertisements.
3 The basic idea of DRIS : "search should be the internal function of Internet and every one should have his own search engine". DRIS just provide the rude search results (like the results in current search engine). Many intelligent search systems can apply DRIS as their data source and provide high quality of personal or commercial search service. So commercial search engine can still survive in the way it should be.
4 Although DRIS gives us an excellent and promising solution for the new public Internet search system, this can't ensure the establishment of DRIS. One important principle in technology, the best technology is the technology that can meet the urgent demand in society. This is just the secret of DRIS. In our testbed, in organization level, only few universities have the web search engine for the school network. Say nothing of union search system that can efficiently integrate all its information resources such as ftp, BBS and special databases in library. It's the demand in third layer. Sharing the information resources between different organizations is also an attraction, which is the demand to build the second layer's DRIS. In the top layer, integrating all the information resources on Internet may be the dream of everyone.
5 Practice is the only principle to judge a theory. Now we have built some experimental third layer's DRIS servers in HuBei Province. I can only say that things just should like this.
Protocol Series of DRIS(for work group) Description of Working Group:
With the rapid increase of the web pages, the coverage of search engines will become poorer and the update interval will be much longer. If the current architecture of search engines is still in use, it will be an impossible mission to find the precise and comprehensive information in the future. This problem will be more serious when IPV6 technology is widely implemented in communication networks. The problem of "Too much information means no information" may become a disaster with information explosion. To solve this problem, there should be an efficient information management system for Internet.
In this group, Domain Resource Integrated System--DRIS will be proposed. DRIS is a distributed information retrieval system, which will build the information retrieval infrastructure for the Internet and also can be regarded as a kind of Internet information management system.
DRIS is a hierarchical distributed search system and comprise three kinds of information retrieval system, conventional database system, distributed search system and metadata harvest system. We will first define the basic search system and then define the entire DRIS.
Specific work items are:
1 Standard distributed search system. It defines the platform-independent search interface and a collection description standard for heterogeneous information resources. An I-D "information retrieval protocol for digital resources" has been proposed. (http://www.ietf.org/internet-drafts/draft-liang-irpdl-03.txt)
2 Standard metadata harvest system. A protocol based some available opening standard like OAI will be proposed. It will define a standard metadata that can be compatible with most database system.
3 Standard public web pages search system. There are many kinds of database system. As long as they can provide the standard distributed search interface or comply with the metadata harvest format, they can be brought into DRIS in appropriate layer. But web pages are special for its distributed character and astronomical amount. To efficiently integrate web pages on Internet, DRIS will build a public opening web pages database, which will strictly comply with the principle of (organization level-conventional database system)-(sub country Internet level-metadata harvest system)-(country level-distributed system). (More information: Make search become the internal function of Internet. http://arxiv.org/abs/cs.IR/0311015)
4 DRIS. It will define entire DRIS. It includes its whole architecture, the relation between different nodes, etc. (more information: Evolution:Google vs.GRIS. http://arxiv.org/abs/cs.DL/0312024)
5 DRIS and IPV6. The cooperation with IPV6 WG will be proposed. IPV6 will be the most distinct feather of next generation Internet.IPV6 is still in improving and any technology that can benefit the Internet all can be added to the IPV6 system. Since the searching is the main service of most user of Internet and this service is not so satisfied to us in current Internet, why not take this request into account when build the new Internet. For example, in IPV6, all kinds of data flows are assigned a priority, and then Internet can guarantee a high priority to the data flow of DRIS. So there may need some considerations for the relation between DRIS and IPV6.
The detailed information about DRIS could be found in http://www.lib.hust.edu.cn/dl-lib/English/main.htm
Ask for more advices. Thanks
_______________________________________________
This message was passed through ietf_censored@xxxxxxxxxxxxxxxxxxxx, which is a sublist of ietf@xxxxxxxxx Not all messages are passed. Decisions on what to pass are made solely by IETF_CENSORED ML Administrator (ietf_admin@xxxxxxxx).
------------ Graham Klyne For email: http://www.ninebynine.org/#Contact