|
Holger's Java API |
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
public interface CrawlerSetting
CrawlerSetting defines callback functions that determine the behavior in which a web search algorithm goes through the net and calculates its results. A CrawlerSetting can be used with a com.antelmann.net.Spider.
Spider,
Spider.crawlWeb(CrawlerSetting, int, com.antelmann.util.logging.Logger)| Method Summary | |
|---|---|
boolean |
followLinks(URL url,
URL referer,
int depth,
List<URL> resultURLList,
List<URL> closedURLList,
List<Spider.URLWrapper> searchURLWrapperList)
followLinks() determines whether the given URL is to be searched for its links to be examined further in the next level. |
boolean |
matchesCriteria(URL url,
URL referer,
int depth,
List<URL> resultURLList,
List<URL> closedURLList)
This method decides whether either the URL itself or its content qualifies for what this CrawlerSetting searches for; as this function is also called on every URL encountered, it is also the place for any custom parsing this CrawlerSetting wants to do. |
| Method Detail |
|---|
boolean matchesCriteria(URL url,
URL referer,
int depth,
List<URL> resultURLList,
List<URL> closedURLList)
url - the URL in question to satisfy the criteriareferer - url's referer URLdepth - link distance from the original root URL where the search beganresultURLList - List of URLs that have already been found to match this CrawlerSetting's criteriaclosedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteria
boolean followLinks(URL url,
URL referer,
int depth,
List<URL> resultURLList,
List<URL> closedURLList,
List<Spider.URLWrapper> searchURLWrapperList)
url - the URL that is to be examined for its linksreferer - url's referer URLdepth - distance from the original root URL where the search beganresultURLList - List of URLs that have already been found to match this CrawlerSetting's criteriaclosedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteriasearchURLWrapperList - List of Spider.URLWrapper objects already identified to be examined in the next levelSpider.URLWrapper
|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||