info.aduna.infosource.crawl
Interface ProcessorHook


public interface ProcessorHook

A ProcessorHook is invoked by a CrawlResultProcessor at various stages during the interpretation of a DataObject. It allows for extending the interpretation process with application-specific extensions.

Future work: reconsider how ProcessorHooks are created, set and applied. For example, are they allowed to be stateful, e.g. to bridge information from one method call to a later method call? This would for example limit their usability in multi-threaded crawl result processing, unless a new ProcessorHook is created for every DataObject.


Method Summary
 void crawled(org.openrdf.model.URI uri, org.openrdf.model.Graph metadata)
          Invoked right after the DataObject is received from the Crawler.
 void extracted(org.openrdf.model.URI uri, org.openrdf.model.Graph metadata)
          Invoked right after application of the Extractor but before any other processing specific to the CrawlResultProcessor has taken place.
 void interpreted(org.openrdf.model.URI uri, org.openrdf.model.Graph metadata)
          Invoked right after the CrawlResultProcessor-specific interpretation and processing of the metadata but before the metadata is stored in the persistent RDF store.
 

Method Detail

crawled

void crawled(org.openrdf.model.URI uri,
             org.openrdf.model.Graph metadata)
Invoked right after the DataObject is received from the Crawler. No interpretation of the metadata delivered by the Crawler has taken place yet.

Parameters:
uri - The URI of the crawled resource.
metadata - The metadata statements produced by the Crawler.

extracted

void extracted(org.openrdf.model.URI uri,
               org.openrdf.model.Graph metadata)
Invoked right after application of the Extractor but before any other processing specific to the CrawlResultProcessor has taken place.

This method is invoked for every DataObject, regardless of whether an Extractor could actually be applied. For example, it is also invoked for non-FileDataObjects, for FileDataObjects for which no matching Extractor could be found or for FileDataObjects whose Extractors produced an error.

Parameters:
uri - The URI of the crawled resource.
metadata - The metadata statements produced by the Crawler and the Extractor.

interpreted

void interpreted(org.openrdf.model.URI uri,
                 org.openrdf.model.Graph metadata)
Invoked right after the CrawlResultProcessor-specific interpretation and processing of the metadata but before the metadata is stored in the persistent RDF store.

Parameters:
uri - The URI of the crawled resource.
metadata - The metadata statements produced by the Crawler and the Extractor.


Copyright © 1997-2008 Aduna. All Rights Reserved.