Information Retrieval


The Inverted Index

See this natural language processing playlist.

Web Crawling

Web crawlers are just programs that download webpages, collect all the links, and visit all those links to repeat the process. This understates the potential complexity of the process quite a bit. There are many other additions and features to be added to a web crawler.

To figure out what features a web crawler must implement you need to produce a list of characteristics for the system. This can be difficult because many characteristics interfere with one another leading to trade-offs.

Features and Characteristics

References