info.aduna.infosource.crawl.extract
Class ExtractionUtil

java.lang.Object
  extended by info.aduna.infosource.crawl.extract.ExtractionUtil

public class ExtractionUtil
extends Object

Author:
Herko ter Horst

Constructor Summary
ExtractionUtil()
           
 
Method Summary
static void extract(ThreadedExtractorContainer tec, InputStream stream, org.semanticdesktop.aperture.rdf.RDFContainer metadata)
          Extract full text and metadata from the specified stream and add statements representing the extracted data to the specified RDFContainer.
static InputStream getMarkSupportedStream(InputStream in, int minBufferSize)
          Get an InputStream that supports mark and reset based on the specified InputStream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExtractionUtil

public ExtractionUtil()
Method Detail

extract

public static void extract(ThreadedExtractorContainer tec,
                           InputStream stream,
                           org.semanticdesktop.aperture.rdf.RDFContainer metadata)
                    throws org.semanticdesktop.aperture.extractor.ExtractorException,
                           IOException
Extract full text and metadata from the specified stream and add statements representing the extracted data to the specified RDFContainer.

Parameters:
tec - the ThreadedExtractorContainer to notify of the extraction process
stream - the stream to extract from
metadata - the RDFContainer to add data to
Throws:
org.semanticdesktop.aperture.extractor.ExtractorException - if something goes wrong during the extraction process
IOException - if something goes wrong reading from the stream

getMarkSupportedStream

public static InputStream getMarkSupportedStream(InputStream in,
                                                 int minBufferSize)
                                          throws IOException
Get an InputStream that supports mark and reset based on the specified InputStream.

Parameters:
in - the InputStream to use
minBufferSize - the minimum buffer size the result should use
Returns:
the input, if it already supports mark and reset, or a BufferedInputStream with at least the specified minimum buffer size
Throws:
IOException - if the wrapping stream could not be created


Copyright © 1997-2008 Aduna. All Rights Reserved.