info.aduna.infosource.crawl
Class CrawlingRepository

java.lang.Object
  extended by info.aduna.infosource.crawl.CrawlingRepository
All Implemented Interfaces:
InfoSource, org.openrdf.repository.Repository

public class CrawlingRepository
extends Object
implements InfoSource

CrawlingRepository provides a Repository capable of populating itself using Aperture Crawlers and Extractors.


Field Summary
static org.semanticdesktop.aperture.mime.identifier.magic.MagicMimeTypeIdentifierFactory IDENTIFIER_FACTORY
           
 
Constructor Summary
CrawlingRepository()
           
 
Method Summary
 void crawl(CrawlingListener listener, boolean fullRecrawl)
          Instructs this CrawlingRepository to crawl its DataSource for new, changed or deleted information and update its repository accordingly.
static org.semanticdesktop.aperture.accessor.DataAccessorRegistry getAccessorRegistry()
           
 org.openrdf.repository.RepositoryConnection getConnection()
           
static org.semanticdesktop.aperture.crawler.CrawlerRegistry getCrawlerRegistry()
           
 File getDataDir()
           
 org.semanticdesktop.aperture.datasource.DataSource getDataSource()
           
static org.semanticdesktop.aperture.extractor.ExtractorRegistry getExtractorRegistry()
           
 boolean getIncludeInListCrawl()
          Returns whether this CrawlingRepository wants to be included in a refresh of all CrawlingRepositories.
 LuceneIndex getIndex()
           
static LanguageIdentifier getLanguageIdentifier()
           
static org.semanticdesktop.aperture.hypertext.linkextractor.LinkExtractorRegistry getLinkExtractorRegistry()
           
 ProcessorHook getProcessorHook()
           
 org.openrdf.model.ValueFactory getValueFactory()
           
 void initialize()
           
 boolean isWritable()
           
 org.openrdf.model.URI prepareAccess(org.openrdf.model.URI uri)
          Returns a URI (typically representing a file of web page) that can be used to view the contents of the specified URI.
 void setDataDir(File dataDir)
           
 void setDataSource(org.semanticdesktop.aperture.datasource.DataSource dataSource)
           
 void setIncludeInListCrawl(boolean includeInListCrawl)
          Sets whether this CrawlingRepository should be included in a refresh of all CrawlingRepositories.
 void setProcessorHook(ProcessorHook processorHook)
           
 void shutDown()
           
 void stopCrawling()
          Instructs this CrawlingRepository to stop any ongoing crawling processes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

IDENTIFIER_FACTORY

public static final org.semanticdesktop.aperture.mime.identifier.magic.MagicMimeTypeIdentifierFactory IDENTIFIER_FACTORY
Constructor Detail

CrawlingRepository

public CrawlingRepository()
Method Detail

getDataSource

public org.semanticdesktop.aperture.datasource.DataSource getDataSource()

setDataSource

public void setDataSource(org.semanticdesktop.aperture.datasource.DataSource dataSource)

getDataDir

public File getDataDir()
Specified by:
getDataDir in interface org.openrdf.repository.Repository

setDataDir

public void setDataDir(File dataDir)
Specified by:
setDataDir in interface org.openrdf.repository.Repository

getProcessorHook

public ProcessorHook getProcessorHook()

setProcessorHook

public void setProcessorHook(ProcessorHook processorHook)

initialize

public void initialize()
                throws org.openrdf.repository.RepositoryException
Specified by:
initialize in interface org.openrdf.repository.Repository
Throws:
org.openrdf.repository.RepositoryException

getValueFactory

public org.openrdf.model.ValueFactory getValueFactory()
Specified by:
getValueFactory in interface org.openrdf.repository.Repository

getConnection

public org.openrdf.repository.RepositoryConnection getConnection()
                                                          throws org.openrdf.repository.RepositoryException
Specified by:
getConnection in interface org.openrdf.repository.Repository
Throws:
org.openrdf.repository.RepositoryException

isWritable

public boolean isWritable()
                   throws org.openrdf.repository.RepositoryException
Specified by:
isWritable in interface org.openrdf.repository.Repository
Throws:
org.openrdf.repository.RepositoryException

shutDown

public void shutDown()
              throws org.openrdf.repository.RepositoryException
Specified by:
shutDown in interface org.openrdf.repository.Repository
Throws:
org.openrdf.repository.RepositoryException

getIndex

public LuceneIndex getIndex()

prepareAccess

public org.openrdf.model.URI prepareAccess(org.openrdf.model.URI uri)
Description copied from interface: InfoSource
Returns a URI (typically representing a file of web page) that can be used to view the contents of the specified URI.

Specified by:
prepareAccess in interface InfoSource
Returns:
The URI that can be opened to see the contents of the file. Typically this URI represents a web page or local file.

getCrawlerRegistry

public static org.semanticdesktop.aperture.crawler.CrawlerRegistry getCrawlerRegistry()

getAccessorRegistry

public static org.semanticdesktop.aperture.accessor.DataAccessorRegistry getAccessorRegistry()

getExtractorRegistry

public static org.semanticdesktop.aperture.extractor.ExtractorRegistry getExtractorRegistry()

getLanguageIdentifier

public static LanguageIdentifier getLanguageIdentifier()

getLinkExtractorRegistry

public static org.semanticdesktop.aperture.hypertext.linkextractor.LinkExtractorRegistry getLinkExtractorRegistry()

crawl

public void crawl(CrawlingListener listener,
                  boolean fullRecrawl)
           throws IOException
Instructs this CrawlingRepository to crawl its DataSource for new, changed or deleted information and update its repository accordingly. This method is blocking until crawling is completed.

Parameters:
listener - A CrawlingListener to send events about the progress to.
fullRecrawl - Flag that indicates whether we are in full recrawl mode or not. This is purely meant for logging/reporting, the CrawlingRepository is not responsible for actually clearing any info when this flag is on.
Throws:
IOException

stopCrawling

public void stopCrawling()
Instructs this CrawlingRepository to stop any ongoing crawling processes. This method returns immediately.


getIncludeInListCrawl

public boolean getIncludeInListCrawl()
Returns whether this CrawlingRepository wants to be included in a refresh of all CrawlingRepositories.


setIncludeInListCrawl

public void setIncludeInListCrawl(boolean includeInListCrawl)
Sets whether this CrawlingRepository should be included in a refresh of all CrawlingRepositories.



Copyright © 1997-2008 Aduna. All Rights Reserved.