public class HTMLParser extends BaseParser
Modifier and Type | Field and Description |
---|---|
static int |
ELEMENT_TYPE_EMPTY
Empty element.
|
static int |
ELEMENT_TYPE_FORM
Element that doesn't have to follow any nesting rules or even be well-formed.
|
static int |
ELEMENT_TYPE_INLINE
Inline element.
|
static int |
ELEMENT_TYPE_LI
List item closed by any other li element.
|
static int |
ELEMENT_TYPE_LIST
Block element closed by an element of the same type.
|
static int |
ELEMENT_TYPE_NESTED
Block element that can be nested.
|
static int |
ELEMENT_TYPE_P
Block element closed by any other block element.
|
static int |
ELEMENT_TYPE_SCRIPT
Block element that doesn't contain markup.
|
static int |
ELEMENT_TYPE_TABLE
Table element (other than table) closed by any other table element.
|
static int |
ELEMENT_TYPE_WHATEVER
Element can be either inline or block or whatever.
|
static Map |
HTMLElementTypes |
static Map |
HTMLEntities |
protected AttributesImpl |
m_attrs |
protected char[] |
m_buf |
protected List |
m_elemStack |
ACCEPT_CHARSET, CONTENT_CHARSET, DEFAULT_CHARSET, m_contentHandler, m_dtdHandler, m_entityResolver, m_errorHandler, m_features, m_lexicalHandler, m_properties, m_recognizedFeatures, m_recognizedProperties, PROPERTY_LEXICAL_HANDLER
Constructor and Description |
---|
HTMLParser() |
Modifier and Type | Method and Description |
---|---|
protected int |
append(char[] buf,
int off,
int ch)
Appends the specified char to the specified buf at the specified offset.
|
protected int |
append(char[] buf,
int off,
String ch)
Appends the specified string to the specified buf at the specified offset.
|
protected int |
appendComment(char[] buf,
int off,
int ch)
Appends the specified char to the specified buf at the specified offset.
|
protected int |
appendEntityRef(char[] buf,
int off,
String ref)
Appends the specified string to the specified buf at the specified offset --
or reports the entity ref.
|
protected void |
closeOptionalElements(List elemStack,
String curElem) |
protected int |
flush(char[] buf,
int off)
Flushes chars to content handler.
|
protected boolean |
isEmptyElement(String curElem) |
protected boolean |
isFormElement(String curElem) |
protected boolean |
isScriptElement(String curElem) |
void |
parse(InputSource input)
Parse an XML document.
|
protected int |
parseCharRef(Reader src,
char[] buf,
int off)
Decodes char ref starting after (ie '169;' or 'xa0;').
|
getCharacterStream, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getProperty, parse, setContentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setProperty
public static final int ELEMENT_TYPE_WHATEVER
public static final int ELEMENT_TYPE_EMPTY
public static final int ELEMENT_TYPE_INLINE
public static final int ELEMENT_TYPE_P
public static final int ELEMENT_TYPE_TABLE
public static final int ELEMENT_TYPE_LI
public static final int ELEMENT_TYPE_LIST
public static final int ELEMENT_TYPE_NESTED
public static final int ELEMENT_TYPE_SCRIPT
public static final int ELEMENT_TYPE_FORM
protected char[] m_buf
protected List m_elemStack
protected AttributesImpl m_attrs
public static Map HTMLElementTypes
public static Map HTMLEntities
public void parse(InputSource input) throws IOException, SAXException
The application can use this method to instruct the XML reader to begin parsing an XML document from any valid input source (a character stream, a byte stream, or a URI).
Applications may not invoke this method while a parse is in progress (they should create a new XMLReader instead for each nested XML document). Once a parse is complete, an application may reuse the same XMLReader object, possibly with a different input source.
During the parse, the XMLReader will provide information about the XML document through the registered event handlers.
This method is synchronous: it will not return until parsing has ended. If a client application wants to terminate parsing early, it should throw an exception.
input
- The input source for the top-level of the
XML document.SAXException
- Any SAX exception, possibly
wrapping another exception.IOException
- An IO exception from the parser,
possibly from a byte stream or character stream
supplied by the application.InputSource
,
BaseParser.parse(String)
,
BaseParser.setEntityResolver(org.xml.sax.EntityResolver)
,
BaseParser.setDTDHandler(org.xml.sax.DTDHandler)
,
BaseParser.setContentHandler(org.xml.sax.ContentHandler)
,
BaseParser.setErrorHandler(org.xml.sax.ErrorHandler)
protected int flush(char[] buf, int off) throws IOException, SAXException
IOException
SAXException
protected int appendComment(char[] buf, int off, int ch) throws IOException, SAXException
IOException
SAXException
protected int append(char[] buf, int off, int ch) throws IOException, SAXException
IOException
SAXException
protected int append(char[] buf, int off, String ch) throws IOException, SAXException
IOException
SAXException
protected int appendEntityRef(char[] buf, int off, String ref) throws IOException, SAXException
IOException
SAXException
protected int parseCharRef(Reader src, char[] buf, int off) throws IOException, SAXException
IOException
SAXException
protected boolean isEmptyElement(String curElem) throws IOException, SAXException
IOException
SAXException
protected boolean isScriptElement(String curElem) throws IOException, SAXException
IOException
SAXException
protected boolean isFormElement(String curElem) throws IOException, SAXException
IOException
SAXException
protected void closeOptionalElements(List elemStack, String curElem) throws IOException, SAXException
IOException
SAXException