chame/htmlparser

Types

HTML5Parser[Handle; Atom] = object
HTML5ParserOpts[Handle; Atom] = object
  isIframeSrcdoc*: bool      ## Is the document an iframe srcdoc?
  scripting*: bool           ## Is scripting enabled for this document?
  ctx*: Option[OpenElementInit[Handle, Atom]] ## Context element for fragment parsing. When set to some Handle,
                                              ## the fragment case is used while parsing.
                                              ## 
                                              ## `token` must be a valid starting token for this element.
  initialTokenizerState*: TokenizerState ## The initial tokenizer state; by default, this is DATA.
  openElementsInit*: seq[OpenElementInit[Handle, Atom]] ## Initial state of the stack of open elements. By default, the stack
                                                        ## starts out empty.
                                                        ## Note: if this is initialized to a non-empty sequence, the parser will
                                                        ## start by resetting the insertion mode appropriately.
  formInit*: Option[Handle]  ## Initial state of the parser's form pointer.
  pushInTemplate*: bool ## When set to true, the "in template" insertion mode is pushed to the
                        ## stack of template insertion modes on parser start.
OpenElement[Handle; Atom] = tuple[element: Handle, token: Token[Atom]]
OpenElementInit[Handle; Atom] = tuple[element: Handle, startTagName: Atom]
ParseResult = enum
  PRES_CONTINUE, PRES_STOP, PRES_SCRIPT

Result of parsing the passed chunk. PRES_CONTINUE is returned when it is OK to continue parsing.

PRES_STOP is returned when the parser has been stopped from setEncodingImpl.

PRES_SCRIPT is returned when a script end tag is encountered. For implementations that do not support scripting, this can be treated equivalently to PRES_CONTINUE.

Implementations that do support scripting and implement document.write can instead use PRES_SCRIPT to process string injected into the input stream by document.write before continuing with parsing from the network stream. In this case, script elements should be stored in e.g. the DOM builder from elementPoppedImpl, and processed accordingly after PRES_SCRIPT has been returned.

Procs

proc atomToTagType[Handle, Atom](parser: HTML5Parser[Handle, Atom]; atom: Atom): TagType
proc finish[Handle, Atom](parser: var HTML5Parser[Handle, Atom])
Finish parsing the document associated with parser. This will process an EOF token, and pop all elements from the stack of open elements one by one.
func getInsertionPoint(parser: HTML5Parser): int
proc initHTML5Parser[Handle, Atom](dombuilder: DOMBuilder[Handle, Atom];
                                   opts: HTML5ParserOpts[Handle, Atom]): HTML5Parser[
    Handle, Atom]

Create and initialize a new HTML5Parser object from dombuilder dombuilder and parser options opts.

The generic Handle must be the node handle type of the DOM builder. The generic Atom must be the interned string type of the DOM builder.

proc parseChunk[Handle, Atom](parser: var HTML5Parser[Handle, Atom];
                              inputBuf: openArray[char]): ParseResult
Parse a chunk of characters stored in inputBuf with parser.

Exports

SetEncodingResult, ParsedAttr, DOMBuilderBase, DOMBuilder, TAG_RB, TAG_TYP, TAG_OBJECT, TAG_DFN, TAG_SUMMARY, TAG_DEFINITION_URL, TAG_HTTP_EQUIV, PREFIX_XML, TAG_PLAINTEXT, NamespacePrefix, PREFIX_XLINK, HTagTypes, XMLNS, TAG_EMBED, TAG_IMAGE, TAG_TH, TAG_DATALIST, TAG_COL, TAG_TABLE, TAG_INS, TAG_BODY, TAG_PRE, TAG_FRAMESET, TAG_B, TAG_DD, TAG_FONT, TAG_RT, TAG_FORM, TAG_BDO, TAG_OL, TAG_TIME, TAG_ABBR, TAG_LINK, TAG_MI, TAG_SPAN, TAG_HEADER, TAG_NOEMBED, TAG_LI, TAG_NOSCRIPT, TAG_DATA, TAG_KEYGEN, TAG_MALIGNMARK, TAG_IMG, TAG_BLINK, TAG_UNKNOWN, TAG_MGLYPH, TAG_OPTGROUP, TAG_SECTION, TAG_FIGURE, TAG_MARQUEE, TAG_MAP, TAG_A, TAG_DETAILS, QuirksMode, TAG_LABEL, TAG_DESC, TAG_DEL, TAG_MO, HTML, TAG_HTML, TAG_WBR, TAG_FRAME, TAG_CITE, TAG_SELECT, TAG_VAR, TAG_AREA, TAG_DIV, TAG_SUP, FormAssociatedElements, TAG_SVG, TAG_BR, TAG_DIR, TAG_OPTION, TAG_TFOOT, TAG_H5, TAG_SEARCH, TAG_KBD, Namespace, TAG_ANNOTATION_XML, TAG_TRACK, AllTagTypes, TAG_RTC, TAG_Q, TAG_MARK, TAG_PICTURE, MATHML, TAG_H3, TAG_IFRAME, TAG_HEAD, TAG_EM, TAG_NOBR, TAG_HR, TAG_CHARSET, TAG_H6, TAG_BLOCKQUOTE, TAG_DL, TAG_CONTENT, TAG_OUTPUT, TAG_ADDRESS, TAG_MN, TAG_ARTICLE, TAG_P, XLINK, TAG_LEGEND, TAG_XMP, TAG_RUBY, TAG_CODE, PREFIX_UNKNOWN, TAG_SAMP, TAG_AUDIO, TAG_MATH, TAG_FIGCAPTION, TAG_I, TAG_META, TAG_PROGRESS, TAG_STYLE, PREFIX_XMLNS, TAG_FOOTER, TAG_MS, TAG_U, TAG_H4, TAG_BUTTON, TAG_TEXTAREA, TAG_DIALOG, TAG_ENCODING, TAG_COLOR, ListedElements, TAG_PORTAL, TAG_SOURCE, TAG_TT, TAG_CAPTION, TAG_STRONG, TAG_ASIDE, TAG_CANVAS, SVG, TAG_H2, TAG_NOFRAMES, TAG_TEMPLATE, TAG_LISTING, TAG_TITLE, TAG_BASE, TAG_BGSOUND, TagType, TAG_MENU, TAG_FACE, TAG_BASEFONT, TAG_CENTER, TAG_TR, TAG_METER, TAG_VIDEO, TAG_SIZE, TAG_S, TAG_BIG, TAG_SARCASM, TAG_DT, TAG_RP, TAG_NAV, TAG_H1, TAG_TBODY, TAG_MAIN, TAG_THEAD, TAG_FIELDSET, TAG_SUB, TAG_COLGROUP, XML, TAG_SCRIPT, TAG_TD, TAG_STRIKE, TAG_SMALL, TAG_APPLET, TAG_INPUT, TAG_BDI, TAG_FOREIGN_OBJECT, TAG_UL, NO_PREFIX, TAG_MTEXT, NAMESPACE_UNKNOWN, NO_NAMESPACE, TAG_PARAM, TAG_HGROUP, TokenizerState