Typically the crawler is the component of the search engine responsible for going out and retrieving
content for the indexer to catalog. It reads the list of addresses from the database and downloads a copy
of each document to queue locally to disk where the indexer can access them. Then the indexer
component processes each file in the queue. This tag - team approach works well for large search sites
with massive amounts of data continuously being indexed or if the crawler scans through the documents
in search of links to other documents to retrieve (as is the case with recursive downloading/leeching).
No comments:
Post a Comment