Sem Spirit

The Web

The Web is a framework of public exchange of information based on the Internet network that appeared at the end of the Eighties. It allows the organization in the form of Web sites of interactive resources whose contents coded on electronic support is mainly intended for a human exploitation. The information circulating on the Web is partly structured by the standard hypertext format (HTML) proposed in 1989 by Tim Berners-Lee. This format makes possible, among other things, to establish relationships between digital documents and to present the contents for a human reading. The power and freedom of expression that the Web confers on the individual is such that in the two decades following its birth the size of the Web will continue to grow to reach a number that currently exceeds 100 million sites.

This overabundance of information leads to the emergence of new needs, some quantitative, such as maintaining large amounts of data efficiently; others of a rather qualitative nature, such as creating, searching for or exploiting the information or service that is most relevant to certain requirements. In 2001, T. Berners-Lee, J. Hendler and O. Lassila propose in a new infrastructure based on that of the Web answering in theory the need for a more relevant structuring of information: the Semantic Web.

Although the Semantic Web suggests considerable progress in information sharing, it is nonetheless defined in the continuity of the Web which already allows a comfortable expressivity. Among the tools that contributed to the emergence of the Web, mention may be made of the content management systems (CMS) that appeared in the mid-1990s, are based on the hypertext format and offer relatively sophisticated services. They were designed to dynamically update large amounts of information while providing opportunities for collaborative work, content uploading, structuring, and a clear separation between content management and publication management. CMS offers a variety of methods to take advantage of the content. They are based on the addition of references (by hypertext links), on « full text » search methods and sorting procedures. These methods are sometimes supplemented by categorization mechanisms (in the form of taxonomies for example) and content indexing from the terms of a thesaurus. Over time, some CMS have improved their content management services with a « semantic » level; blurring the line between pure content management and knowledge management. Since the 2000s, the Web already allows the integration of autonomous agents exploiting the functions of the network. But there is a real enthusiasm that the Semantic Web will provide a sufficiently expressive infrastructure for a real « automated orchestration » of these agents. This is the field of semantic web services that takes advantage of logic-based languages (as WSDL, Web Service Description Language) to describe the services offered by an agent.

To a certain extent, in its early days, the Web already makes it possible to organize the content of a resource in a form that can be exploited by the machine. The hypertext format offers the possibility of describing a web resource in a structured form although it is relatively poor in view of the complexity of the content to be described. These descriptions are associated with the resource and remain accessible via the web. They are referred to as annotation or metadata, although at present the distinctions between these two
terms remain unclear. From a usage point of view, the annotation is a remark, an explanatory note and the metadata is a value encoded in a certain format. The first expresses information on a document and the second gives information. These descriptions can be divided into two classes, those described as objective that concern the electronic support of the resource (for example the date of creation / modification or the author of a file) and those said subjective that relate to its content and depend on its interpretation (for example, the keywords that describe the content). From a fundamental point of view, there does not seem to be any difference between the notion of metadata and that of annotation. Some people agree that the annotation of a document becomes a metadata when it is separated to be stored in a database and exploited independently of the resource; or that a metadata accompanies the resource and follows a precise coding format, whereas an annotation has been added to the resource’s content during a process and can be written in a natural language ; bringing it closer to a subjective description since it depends on a human or software interpretation. In any case, old formats used to define metadata or annotations are not enough to accurately describe the content of a resource. In the Semantic Web, annotations and metadata are referred as « semantic » and are intended to significantly improve the high-level description of the content and, consequently, its search and access.