MediaCrawler (Antelmann.com Java Packages)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Holger's
Java API

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.antelmann.net
Class MediaCrawler

java.lang.Object
  java.lang.Thread
      com.antelmann.net.MediaCrawler

All Implemented Interfaces:: CrawlerSetting, Runnable

public class MediaCrawler
extends Thread
implements CrawlerSetting
extends Thread
implements CrawlerSetting

MediaCrawler is a single thread that searches the web for files that are of a given type.

Since:: 10/29/2002
Author:: Holger Antelmann
See Also:: Spider

Nested Class Summary
`static interface`	`MediaCrawler.Handler` used to handle the media files found during the search of the MediaCrawler

Nested classes/interfaces inherited from class java.lang.Thread
`Thread.State, Thread.UncaughtExceptionHandler`

Field Summary

Fields inherited from class java.lang.Thread
`MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY`

Constructor Summary
`MediaCrawler(URL rootURL, int depth, String mediaExtension, boolean currentSiteOnly, MediaCrawler.Handler handler, String[] pattern)`
`MediaCrawler(URL rootURL, int depth, String mediaExtension, boolean currentSiteOnly, String[] pattern)`

Method Summary
`void`	`addHandler(MediaCrawler.Handler handler)`
`boolean`	`followLinks(URL url, URL referer, int depth, List<URL> resultURLList, List<URL> closedURLList, List<Spider.URLWrapper> searchURLWrapperList)` followLinks() determines whether the given URL is to be searched for its links to be examined further in the next level.
`URLCache[]`	`getFilesFound()`
`boolean`	`matchesCriteria(URL url, URL referer, int depth, List<URL> resultURLList, List<URL> closedURLList)` This method decides whether either the URL itself or its content qualifies for what this CrawlerSetting searches for; as this function is also called on every URL encountered, it is also the place for any custom parsing this CrawlerSetting wants to do.
`void`	`run()`

Methods inherited from class java.lang.Thread
activeCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield

Methods inherited from class java.lang.Thread

activeCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield

Methods inherited from class java.lang.Object
`equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Constructor Detail

MediaCrawler

public MediaCrawler(URL rootURL,
                    int depth,
                    String mediaExtension,
                    boolean currentSiteOnly,
                    String[] pattern)

MediaCrawler

public MediaCrawler(URL rootURL,
                    int depth,
                    String mediaExtension,
                    boolean currentSiteOnly,
                    MediaCrawler.Handler handler,
                    String[] pattern)

Method Detail

addHandler

public void addHandler(MediaCrawler.Handler handler)

run

public void run()

Specified by:: run in interface Runnable
Overrides:: run in class Thread

getFilesFound

public URLCache[] getFilesFound()

followLinks

public boolean followLinks(URL url,
                           URL referer,
                           int depth,
                           List<URL> resultURLList,
                           List<URL> closedURLList,
                           List<Spider.URLWrapper> searchURLWrapperList)

Description copied from interface: CrawlerSetting

followLinks() determines whether the given URL is to be searched for its links to be examined further in the next level. The three List objects allow the CrawlerSetting to act on potential constrains that may result from e.g. a maximum number of total nodes to be examined (or any other custom checking imaginable). The url may include any URL, including non-HTTP protocols (such as mailto:, ftp:) and image or media URLs.

Specified by:: followLinks in interface CrawlerSetting

Parameters:: url - the URL that is to be examined for its links; referer - url's referer URL; depth - distance from the original root URL where the search began; resultURLList - List of URLs that have already been found to match this CrawlerSetting's criteria; closedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteria; searchURLWrapperList - List of Spider.URLWrapper objects already identified to be examined in the next level
See Also:: Spider.URLWrapper

matchesCriteria

public boolean matchesCriteria(URL url,
                               URL referer,
                               int depth,
                               List<URL> resultURLList,
                               List<URL> closedURLList)

Description copied from interface: CrawlerSetting

This method decides whether either the URL itself or its content qualifies for what this CrawlerSetting searches for; as this function is also called on every URL encountered, it is also the place for any custom parsing this CrawlerSetting wants to do. The two List objects allow the CrawlerSetting to act on potential constrains that may result from e.g. a maximum number of total nodes to be examined (or any other custom checking imaginable). Note that it is the responsibility of the calling object to ensure that this function isn't called multiple times on the same URL if that's not desired. The url may include any URL, including non-HTTP protocols (such as mailto:, ftp:) and image or media URLs

Specified by:: matchesCriteria in interface CrawlerSetting

Parameters:: url - the URL in question to satisfy the criteria; referer - url's referer URL; depth - link distance from the original root URL where the search began; resultURLList - List of URLs that have already been found to match this CrawlerSetting's criteria; closedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteria

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.antelmann.net Class MediaCrawler

MediaCrawler

MediaCrawler

addHandler

run

getFilesFound

followLinks

matchesCriteria

com.antelmann.net
Class MediaCrawler