|
|

|
|
Interested
in subscribing to our Newslist? Click
here
|
Next issue (April 2007)
Monographic
section dedicated to
"Information Technologies for Visually Impaired People"
|

|
Vol. VIII, issue no. 1,
February 2007
Next Generation
Web Search
Published
on behalf of CEPIS by Novática
(ATI, Spain)
|
Guest
Editors:
Ricardo Baeza-Yates, Paolo
Boldi, and José-María
Gómez-Hidalgo
Contents
|
Editions of the monograph in other languages
- Spanish,
by Novática (full edition printed -- already available--; summary,
presentation and abstracts online -- **soon
available**)
|
Editorial Team of Upgrade
Chief
Editor: Llorenç
Pagés-Casas, <pages AT ati DOT es>
Associate Editors:
François Louis
Nicolet, <nicolet AT acm DOT org>; Roberto
Carniel, <rcarniel AT dgt DOT uniud DOT it>; Zakaria Maamar,
<Zakaria DOT Maamar AT zu DOT ac DOT ae>; Soraya Kouadri
Mostéfaoui, <soraya DOT kouadrimostefaoui AT unifr DOT
ch>, Rafael Fernández Calvo,
<rfcalvo
AT ati DOT es>.
(E-mail
addresses written with anti-spamming disguise)
Acrobat
Reader is required to display PDF files
|
|
Editorial
|
Next
Generation Web Search
|
UPENET
(UPGRADE European NETwork)
Papers
from the
Spanish
journal "Novática" and the Polish journal "Pro Dialog" |
CEPIS NEWS
Harmonise Project
News and Events
|
Monograph: Next
Generation Web Search
Published
on behalf of CEPIS by Novática (ATI, Spain)
Guest Editors: Ricardo Baeza-Yates, Paolo
Boldi and José-María Gómez-Hidalgo
Presentation
The Future of Web Search
[HTML]
[PDF:
3 pages, 89 KB]
Ricardo Baeza-Yates, Paolo
Boldi, and José María-Gómez-Hidalgo - Guest Editors
Abstract: The guest editors comment on the monograph
of UPGRADE and Novática and
briefly introduce the papers it consists of. A set
of useful
references about the matter is included too.
Efficient Sparse
Linear System Solution of the Page-Rank Problem [PDF:
7 pages, 255 KB]
Gianna M. Del Corso, Antonio
Gullì, and Francesco Romani
Abstract: The research community has recently devoted an increasing
amount of attention to reducing the computational time needed by Web
ranking algorithms. In particular, many techniques have been proposed
to speed up the well-known PageRank algorithm. This interest is driven
by two dominant factors: (1) the Web Graph is simply huge and is
subject to dramatic updates in terms of nodes and links, therefore the
PageRank assignment tends to become obsolete very quickly; (2) many
PageRank vectors need to be computed according to different choices of
the personalization vectors or when adopting strategies for collusion
detection. In this paper, we show how the PageRank computation in the
original random surfer model can be transformed into the problem of
computing the solution of a sparse linear system. The sparsity of the
obtained linear system makes it possible to exploit the effectiveness
of Markov chain index reordering to speed up the PageRank computation.
In particular, we rearrange the system matrix according to several
permutations and we apply different scalar and block iterative methods
to solve smaller linear systems. We tested our approaches on Web Graphs
crawled from the net. The largest one amounts to about 24 millions
nodes and more than 100 million links. With this Web Graph, the cost
for computing the PageRank is reduced by 58% in terms of Mflops and of
90% in terms of time with respect to the more commonly used Power
method.
Learning to Analyze Natural
Language Texts [PDF:
7 pages, 326 KB]
Giuseppe Attardi
Abstract:
Linguistic analysis is rarely used in information retrieval
applications like Web search, classification or summarization. Recent
advances in statistical and machine learning techniques have spawned
developing tools such as parsers or machine translators which are
accurate and effective enough for large scale deployment. Future
generation Web search engines might perform linguistic analysis of
documents to extract semantic relations and to enrich their indexes to
provide more sophisticated services than document retrieval. To
illustrate these techniques, we outline how to build a dependency
parser which learns from examples.
SNAKET: A Personalized Search-result Clustering
Engine [PDF:
8 pages, 354 KB]
Paolo Ferragina and
Antonio Gullì
Abstract: We propose a (meta-)search engine, called SNAKET, that
queries 16 commodity search engines — specializing on the topics Web,
blog, books and news — and then offers two complementary views on their
returned results. One is the classical ranked list, the other one
consists of a hierarchical organization of the results into folders
labeled with variable-length sentences which are created
on-the-fly at query time. These labels capture the "theme" of the query
results contained into their associated folders. Users can eventually
browse the labeled folder hierarchy with various goals: knowledge
extraction, query refinement, or results personalization. This form of
personalization is privacy preserving and non intrusive for the
underlying search engines.
The Multimodal
Nature of the Web: New Trends in Information Access [PDF:
6 pages, 299 KB]
Luis-Alfonso
Ureña-López,
Manuel-Carlos Díaz-Galiano, Arturo Montejo-Raez, and Mª
Teresa Martín-Valdivia
Abstract: The rapid evolution
of the World Wide Web has changed our view of it. It has turned into a
collaborative framework where technological and social trendstrends
come together, resulting in the over exploited term Web 2.0. In this
new multimodal and multilingual paradigm, all our techniques for the
search and retrieval of information need to be applied, managing not
only textual information, but also visual data (images or videos) that
can help to improve our systems. In the present paper, along with a
brief analysis of the described scenario, we introduce an experience in
the medical domain for the retrieval of multimodal information (text
and images).
Adversarial
Information Retrieval in the Web [PDF:
8 pages, 126 KB]
Ricardo
Baeza-Yates, Paolo Boldi, and José
María-Gómez-Hidalgo
Abstract: The Web is the killer
application of the Internet. Without doubt, such a useful application
is destined to be the subject of abuse, as others like e-mail are. Spam
has invaded the Search Engines, the Social Networks, and moreover, the
Web is also abused by its users and not only the content providers.
Adversarial Information Retrieval (AIR) deals with the classification
of content (or use of content) regarding its abuse quality, and faces
an adversary (the abuser), who is ever trying to mislead the
classifier. Search Engine spam detection, Web content filtering, and
others, are instances of AIR in the Web. In this work, we review a
number of AIR problems in the Web, along with some proposed solutions.
We pay special attention to link-based Search Engine spam detection,
and to Web content filtering, as representatives of a range of proposed
techniques to reach high effectiveness in controlling Web related abuse.
GERINDO:
Managing and Retrieving Information in Large Document Collections
[PDF:
8 pages, 473 KB]
Nivio
Ziviani, Alberto H.F. Laender, Edleno Silva de Moura, Altigran Soares
da Silva, Carlos A. Heuser, and Wagner Meira Jr.
Abstract: We present in this
article a summary of some of the main results produced in the five
years of the GERINDO research project. This project aims to address the
increasing demand for software tools capable of dealing with
information available in large document collections, such as the World
Wide Web, and involves the participation of several researches from
three Brazilian universities. The project efforts have been focused on
a number of research topics on web information retrieval and
management, such as information retrieval models, searching techniques,
document categorization, semistructured data management, generation of
agents for document collection, and efficiency issues. In addition to
its specific research contributions, the project has stimulated the
interaction among the researchers of the three universities and has
promoted other collaborations with research groups from North America
and Europe.
Research
Directions in Terrier: a Search Engine for Advanced Retrieval on the
Web [PDF:
8 pages, 315 KB]
Iadh Ounis,
Christina Lioma, Craig Macdonald,
and Vassilis Plachouras
Abstract: This paper describes
the Terrier search engine, giving an overview of its architecture and
main Information Retrieval (IR) features, and reviewing the
cutting-edge research implemented in it, with a special focus on Web
search. IR research is concerned with developing and evaluating search
engines that retrieve relevant documents in response to a user query.
Terrier is a highly flexible, efficient, effective and robust platform
for IR research, readily deployable on large-scale collections of
documents [10]. Terrier implements state-of-the-art
theoretically-founded models for IR, ranging from formal disciplines,
such as probability theory, statistics and natural language processing,
to computational aspects of index compression and retrieval efficiency.
The research put into Terrier constantly expands towards new branches
of the wider IR field, making Terrier a strong, modular and
state-of-the-art platform for developing and assessing new concepts and
ideas.
Yahoo!
Research Barcelona: Web Retrieval and Mining [PDF:
2 pages, 107 KB]
The Yahoo!
Research Team
Abstract: In mid-2005, Yahoo!
Inc. began an ambitious program to create a world class industrial
research lab focusing on how to deliver services over the web to a
range of stakeholders, including advertisers, site owners, content
publishers, and users. The resulting organization, Yahoo! Research, has
embarked on a number of research directions. In early 2006, the first
lab in Europe was launched in Barcelona. Within a year this lab has
become well known in Europe and it is one of the largest groups on Web
retrieval and mining in Europe. In this short paper, we report on the
research directions of Yahoo! Research in general and of the Barcelona
lab in particular, with a focus on the trends and problems we see as
critical to our mission.
The Guest Editors
Ricardo Baeza-Yates is
Director of the new Yahoo! Research laboratories in Barcelona and Latin
America (Santiago, Chile). Before that he was professor and director of
the Center for Web Research at the Computer Science department of the
University of Chile, and also ICREA (Institució
Catalana de Recerca i Estudis Avançats) Professor at the
Department of Technology of the Universitat
Pompeu Fabra in Barcelona, Spain. He holds a Ph.D. in Computer
Science from the University of Waterloo, Canada. He is co-author of the
book Modern Information Retrieval,
published in 1999 by Addison-Wesley, as well as coauthor of the 2nd
edition of the Handbook of
Algorithms and Data Structures, Addison-Wesley, 1991; and
co-editor of Information Retrieval:
Algorithms and Data Structures, Prentice-Hall, 1992. Among other
awards, he received the Organization of American States award for young
researchers in exact sciences (1993). In 2003 he was the first computer
scientist to be elected to the Chilean Academy of Sciences.<ricardo
AT baeza DOT cl>.
Paolo Boldi
obtained his Ph.D. in Computer Science at the University of Milano,
where he is currently Associate Professor at the Dipartimento di Scienze dell’Informazione.
His research interests touched many different topics in theoretical and
applied computer science, such as: domain theory, non-classical
computability theory, distributed computability, anonymous networks,
sense of direction, self-stabilizing systems. More recently, his works
focused on problems related to the World-Wide Web, a field where his
research has also produced software packages used by many people
working in the same area. In particular, he contributed to write a
highly efficient full-text IR engine (MG4J), and a graph compression
tool (WebGraph) that is state-of-art as far as compression ratio is
concerned. <boldi AT dsi DOT unimi DOT it>.
José-María
Gómez-Hidalgo holds a Ph.D. in Mathematics, and has
been a lecturer and researcher at the Universidad
Complutense de Madrid (UCM) and the Universidad Europea de Madrid (UEM),
for 10 years, where he is currently the Head of the Department of
Computer Science. His main research interests include Natural Language
Processing (NLP) and Machine Learning (ML), with applications in
Information Access in newspapers and biomedicine, and Adversarial
Information Retrieval with applications in spam filtering and
pornography detection on the Web. He has taken part in around 10
research projects, heading some of them. José María has
co-authored a number of research papers in the topics above, which can
be accessed at his home page <http://www.esi.uem.es/~jmgomez/>.
He is Program Committee member for CEAS (Conference on Email and Anti-Spam)
2007, the Spam Symposium 2007 and other conferences, and he has
reviewed papers for JASIST (Journal
of the American Society for Information Science and Technology),
ECIR (European Conference on
Information Retrieval), and others. He has also reviewed
research project proposals for the European Commission. <jmgomez AT
uem DOT es>.
UPENET
(UPGRADE
European NETwork) [PDF:
24 pages, 605 KB]
From Novatica (ATI,
Spain)
Informatics
Profession
The Maturity of
IT Professionalism in Europe
Sean
Brady
This paper has been selected for publication, in Spanish, by
Novatica.
Novatica, a founding member of UPENET, is a bimonthly journal published in Spanish by the Spanish
CEPIS society ATI (Asociación de Técnicos de
Informática – Association of Computer Professionals).
Abstract: This paper examines the
maturity of IT Professionalism as implemented by European Computer
Societies, reporting on the results of a survey conducted among several
societies that belong to CEPIS (Council of European Professional
Informatics Societies).
From Pro Dialog
(PTI-PIPS, Poland)
Graphical
Interfaces
Portable Declarative Format for
Specifying Graphical User Interfaces
Zbigniew
Fryźlewicz and Rafał Gierusz
This paper was first published,
in English, by Pro Dialog
(issue no. 22, 2007, pp.
15-26). Pro Dialog, a founding
member of UPENET, is a
biannual journal published jointly, in English or Polish, by the Polish
CEPIS society PTI-PIPS (Polskie Towarzystwo Informatyczne –
Polish Information Processing Society) and the Poznan
University of Technology, Institute of Computing
Science.
Abstract: This paper introduces a portable declarative format for
specifying graphical user interfaces. A generic GUI description is
developed in XML and then converted by means of XSLT. Thus the GUI is
independent of the platform and the programming language. The hope is
that in the future, GUIs will be developed by designers versed in
graphics but not programming. An implementation for Java and C# is
developed.
From Novatica (ATI,
Spain)
Next-generation Web
Blogs: On the Cutting Edge of
the Next-generation Web
Antonio-Miguel
Fumero-Reverón and Fernando Sáez-Vacas
This paper was first published, in Spanish, by Novatica
(issue no. 183, September-October 2006, pp. 68-73). Novatica, a founding member of UPENET, is a bimonthly journal published in Spanish by the Spanish
CEPIS society ATI (Asociación de Técnicos de
Informática – Association of Computer Professionals).
Abstract: This article analyses a number
of social and cultural aspects of the blog phenomenon with the
methodological aid of a complexity model, the New Techno-social
Environment (hereinafter also referred to by its Spanish acronym, NET,
or Nuevo Entorno Tecnosocial) together with the socio-technical
approach of the two blogologist authors. Both authors are researchers
interested in the new reality of the Digital Universal Network (DUN).
After a review of some basic definitions,the article moves on to
highlight some key characteristics of an emerging blog culture and
relates them to the properties of the NET. Then, after a brief
practical parenthesis for people entering the blogosphere for the first
time, we present some reflections on blogs as an evolution of virtual
communities and on the changes experienced by the inhabitants of the
infocity emerging from within the NET. The article concludes with a
somewhat disturbing question; whether among these changes there might
not be a gradual transformation of the structure and form of human
intelligence.
Back to top of the page
CEPIS NEWS [PDF:
2 page, 72 KB]
Harmonise Project
Building
up to the Final Report
François-Philippe Draguet
Additional information about this project whose aim is contributing to
establish comparable data on ICT vocational training systems and
various approaches to ICT qualification and ICT certification in the
participating countries.
News & Events
European Funded Projects
and News Updates
Monograph:
Next Generation
Web Search
Presentation
The Future of Web Search [PDF:
3 pages, 89 KB]
(includes a set of useful
references about the
matter)
Ricardo Baeza-Yates, Paolo Boldi, and
José-María Gómez-Hidalgo - Guest Editors
Since the publication of the UPGRADE
issue on "Information Retrieval and
the Web" on June, 2002, the dimension of the Web, and the kind
of information on the
Web and its usage, have clearly evolved, posing new challenges for
their most prominent entry points, Search Engines.
Among such challenges are:
1. Advanced search modes. Text
data retrieval (such as what is the capital of France) is one of the
most popular search activities on the Web. However, there are other
search activities with more ambitious and sophisticated goals, such as
searching to learn or to investigate. As more and more users access the
Web, it is increasingly necessary to provide support to ever more
sophisticated search strategies.
2. Efficiency. Since their very
beginning, Search Engines have been designed to return Web references
to user queries in milliseconds. However, dealing with millions of
Web pages is not the same as achieving fast retrieval from thousands of
billions of pages. For instance, according to Netcraft Surveys the
number of Web servers has doubled in the last 18 months. Information on
the Web is increasing faster than computing power, and algorithms have
to be rethought to keep them affordable.
3. Semantic Web. Humans are
capable of using the Web to carry out tasks such as finding the Finnish
word for "car", to reserve a library book, or to search for the
cheapest DVD and buy it. However, a computer cannot accomplish the same
tasks without human direction because web pages are designed to be read
by people, not machines. The Semantic Web is a vision of information
that is understandable by computers, so that they can automate more of
the tedium involved in finding, sharing and combining information on
the web. At its core, the Semantic Web consists of a data model called
Resource Description Framework (RDF), a variety of data interchange
formats (e.g. RDF/XML, Turtle,
etc.), and notations such as RDF Schema (RDFS) and the Web Ontology
Language (OWL) that facilitate formaldescription of concepts, terms,
and relationships within agiven domain. The Semantic Web will enable
new forms ofWeb Search, simpler and more accurate than the present
ones, and it needs to be built around the intelligent processingof
current Web information, including Language and Multimedia Analysis.
4. Online Social Networks. One of
the most important reasons for the growth of the number of Web servers
and pages is the increasing popularity of online social networking
services, such as Flickr, Blogger, Digg, MySpace, YouTube, Wikipedia,
and many others. These sites allow Web users to publish and share
easily and quickly all forms of information, including their personal
thoughts, pictures, videos, interests and references, news items, etc.
The digital expression of personal relationships particularly
facilitates sharing, with features such as Friend of a Friend (FOAF),
which allow users to share their network of personal relationships.
Social Networking services foster the emergence of online dynamic
communities that make social decisions about the quality of Web
content, which will be the key to the next generation of search engines
(just as link analysis was the key to the current generation).
5. Personalization and other forms of
context. As computational power increases, it must be converted
into more advanced search engine features. The exploitation of
information about user context (location, previous and recent searches,
previous and recent clicks, etc.) may deliver more accurate information
to the user as it can be tailored to his or her long and short search
goals and information needs. Context awareness is also central to Web
advertising, a field of ever-increasing importance that can exploit
user information to identify targets in a more effective way.
6. Multimedia and multilingualism. The
Web is still a community of diverse nationalities with different
languages which have yet to be given more than limited support by
search engines. Even very basic internationalization issues (such as
the choice of charset and encoding) are still only covered in a very
partial, unsatisfactory (and western-centric) way by current generation
search engines. With only (increasingly effective) translation services
as multi-language support tools, users are demanding cross-language
features that allow them to cross the language barrier, retrieving
results from queries in their mother tongue in many languages. The
computational capabilities and the quality of multimedia analysis
algorithms also allows better search interfaces and indexes, in which
users pose queries in the form of pictures, audio files or even videos,
in order to obtain multimedia material.
7. Web Spam. What is probably
the Web’s most valuable asset, the possibility of making connections
from pieces of information to persons, is increasingly being the
subject of abuse. Just as email spam erupted on the scene some years
ago, so some content providers are now abusing this valuable means of
communication to obtain an illegal commercial advantage by preparing
pages and links with the aim of getting an undeservedly high rank from
a variety of popular user queries. Moreover, they hack dynamic websites
(forums, social networks, etc.) in order to insert fake references and
content which is ultimately targeted to delivering traffic and rank to
their Web sites, and putting money in their pockets, in something which
is sometimes disguised as Search Engine Optimization Web search engine
operators, social networking services, etc. are required and committed
to stopping, or at least reducing, this kind of abuse.
The authors invited to this special issue are prominent researchers and
representatives of the search engine industry, and their papers cover
most of these issues, providing the reader with a valuable overview of
current and upcoming Web search engines techniques and functionalities.
The work by Gianna
del Corso, Antonio
Gullì and Francesco Romani
focuses on the efficient computation of PageRank measures on an ever
increasing Web graph. Only improvements such as those described in this
paper will enable us to continue to use valuable link analysis
techniques for Web site ranking.
Giuseppe Attardi
describes some of the Natural Language analysis techniques which are at
the core of advanced Semantic Web applications. Language Analysis makes
it possible to build, maintain and exploit the resources needed by the
Semantic Web (in particular, ontologies).
Personalization is covered by the work presented by Paolo Ferragina
and Antonio
Gullì, who describe how to obtain more personalized and
accurate results by using an
advanced and effective Web result clustering engine, which goes by the
name of Snaket.
Luis-Alfonso
Ureña-López, Manuel-Carlos
Díaz-Galiano, Arturo Montejo-Raez
and Mª
Teresa Martín-Valdivia present a list of experiments on
content-based multilingual and multimedia (images and text)
retrieval,which support new ways of querying search engines, with an
emphasis on multimodality: mixed media queries involving text and
sample images.
Ricardo Baeza-Yates,
Paolo Boldi,
and José-María
Gómez-Hidalgo, have prepared an overview of the current
problems and solutions to Web spam and other forms of
abuse, focusing on link analysis and Web content filtering.
Next up, two papers present major and effective research efforts in the
field of Web and large-scale information retrieval.
On the one hand, Nivio
Ziviani, Alberto
H.F.Laender, Edleno Silva de Moura,
Altigran Soares da
Silva, Carlos
A. Heuser, and Wagner Meira jr.
present an overview of some of the Web search related results of
Gerindo, one of the biggest and most prominent research projects on
information retrieval in recent years.
On the other, Iadh
Ounis, Christina
Lioma, Craig
Macdonald, and Vassilis Plachouras
describe Terrier, a high performance framework and engine designed to
allow researchers to look into new information retrieval models,
efficient implementations, and many other relevant topics, easily
deployable on large-scale document collections.
We close this special issue with a description of the direction of
research activities carried out by Yahoo! Research.
Useful
References on Web Search Engines
In
addition to the references and sources mentioned in the articles of
this issue, interested readers may like to take a look at the following
Web sites, books, journals, and conference proceedings.
Books
- S.
Abiteboul, P. Buneman and D. Suciu. Data on the Web:
from Relations to Semistructured Data and XML,Morgan Kauffman, 2000.
ISBN: 155860622X.
- M. Agosti and A. Smeaton (editors) Information Retrievaland
Hypertext, Kluwer, 1996. ISBN: 079239710X.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval,
Addison-Wesley, 1999. ISBN: 020139829X. Web Site:
<http://sunsite.dcc.uchile.cl/irbook/>.
- S. Chakrabarti. Mining the Web: Analysis of Hypertext and Semi
Structured Data. Morgan Kaufmann, 2003.
- D.A. Grossman and O. Frieder. Information Retrieval: Algorithms
and Heuristics. Springer, 2004. ISBN:1402030045.
- Witten, A. Moffat and T. Bell. Managing Gigabytes,Morgan
Kauffman, 1999 (second edition). ISBN: 1558605703.
Journals
- ACM Transactions on
Information Systems, <http://www.acm.org/pubs/tois/>.
- ACM Transactions on Internet Technology,
<http://www.acm.org/pubs/periodicals/toit/>.
- European Journal of Information Systems,
<http://www.palgrave-journals.com/ejis/index.html>.
- Electronic Library,
<http://www.emeraldinsight.com/info/journals/el/el.jsp>.
- IEEE Intelligent Systems,
<http://www.computer.org/portal/site/intelligent/>.
- IEEE Internet Computing,
<http://www.computer.org/portal/site/internet/>.
- IEEE Transactions on Information Theory,
<http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?puNumber=18>.
- IEEE Transactions on Knowledge and DataEngineering,
<www.computer.org/mc/tkde>.
- Information Processing & Management,
<http://ees.elsevier.com/ipm/>.
- Information Retrieval Journal,
<http://ees.elsevier.com/ipm/>.
- Journal of the Association for Information Systems,
<http://jais.aisnet.org/>.
- SIGIR Forum, <http://www.acm.org/sigs/sigir/forum/>.
- SIGWEB Newsletter, <http://www.sigweb.org/>.
- VLDB Journal,
<http://www.informatik.uni-trier.de/~ley/db/journals/vldb/index.html>.
- World Wide Web, <http://vlib.org/>.
Conferences
- ACM DocEng,
<http://www.document engineering.org/>.
- ACM JCDL, <http://www.acm.org/jcdl/>.
- ACM SIGIR, <http://www.acm.org/sigir/>.
- CIKM, <http://www.cs.umbc.edu/cikm/>.
- CLEF, <http://www.clef-campaign.org/>.
- ECIR, <http://irsg.bcs.org/ecir.php>.
- RIAO, <http://www.riao.org/>.
- SPIRE, <http://cn.net.au/>.
- TREC, <http://trec.nist.gov/>.
Web Sites
- Center for Web
Research, <http://www.cwr.cl>.
- Google Labs, <http://labs.google.com>.
- José María Gómez home page,
<http://www.esp.uem.es/jmgomez>.
- MAVIR Research Program, <http://www.matir.net>.
- Paolo Boldi home page, <http://boldi.dsi.unimi.it>.
- Ricardo Baeza-Yates home page, <http://www. baeza.cl>.
- Search Engine Watch, <http://www.searchenginewatch.com>.
- Web Information Retrieval resources, <http://www.webir.org>.
- World Wide Web
Consortium, <http://w3c.org>.
- Yahoo!
Research, <http://research.yahoo.com>.
Copyright © CEPIS
2007. All rights reserved unless otherwise stated.