Subversion Info

Rev
437
Last Checked In
2008-03-28 00:49:20 (2 weeks ago)
Checked in by
deveiant

Parent

Methods

Included Modules

Class Index

Quicksearch

Arrow::HTMLTokenizer

A simple HTML tokenizer object class that can be used to break HTML down into tokens.

Constants

SVNRev
SVN Revision
SVNId
SVN Id

Attributes

scanner[R]
The StringScanner doing the tokenizing
source[R]
The HTML source being tokenized

Public Class Methods

new( source ) click to toggle source

Create a new Arrow::HtmlTokenizer object.

    # File /Users/ged/source/ruby/Arrow/lib/arrow/htmltokenizer.rb, line 45
45:         def initialize( source )
46:             @source = source
47:             @scanner = StringScanner.new( source )
48:         end

Public Instance Methods

each() {|token| ...} click to toggle source

Enumerable interface: Iterates over parsed tokens, calling the supplied block with each one.

    # File /Users/ged/source/ruby/Arrow/lib/arrow/htmltokenizer.rb, line 64
64:         def each
65:             @scanner.reset
66: 
67:             until @scanner.empty?
68:                 if @scanner.peek(1) == '<'
69:                     tag = @scanner.scan_until( />/ )
70: 
71:                     case tag
72:                     when /^<!--/
73:                         token = HTMLComment.new( tag )
74:                     when /^<!/
75:                         token = DocType.new( tag )
76:                     when /^<\?/
77:                         token = ProcessingInstruction.new( tag )
78:                     else
79:                         token = HTMLTag.new( tag )
80:                     end
81:                 else
82:                     text = @scanner.scan( /[^<]+/ )
83:                     token = HTMLText.new( text )
84:                 end
85: 
86:                 yield( token )
87:             end
88:         end

secsequence

--- SEC00055

seccomment

--- ""

attributes

--- 
- name: scanner
  rw: R
  a_desc: |+
    
    The StringScanner doing the tokenizing
    
- name: source
  rw: R
  a_desc: |+
    
    The HTML source being tokenized
    

method_list

--- 
- methods: 
  - visibility: public
    aref: M000236
    name: new
    sourcecode: "    <span class=\"ruby-comment cmt\"># File /Users/ged/source/ruby/Arrow/lib/arrow/htmltokenizer.rb, line 45</span>\n\
      45:         <span class=\"ruby-keyword kw\">def</span> <span class=\"ruby-identifier\">initialize</span>( <span class=\"ruby-identifier\">source</span> )\n\
      46:             <span class=\"ruby-ivar\">@source</span> = <span class=\"ruby-identifier\">source</span>\n\
      47:             <span class=\"ruby-ivar\">@scanner</span> = <span class=\"ruby-constant\">StringScanner</span>.<span class=\"ruby-identifier\">new</span>( <span class=\"ruby-identifier\">source</span> )\n\
      48:         <span class=\"ruby-keyword kw\">end</span>"
    m_desc: |-
      <p>
      Create a <a href="HTMLTokenizer.html#M000236">new</a> Arrow::HtmlTokenizer
      object.
      </p>
    params: ( source )
  category: Class
  type: Public
- methods: 
  - visibility: public
    aref: M000237
    name: each
    sourcecode: "    <span class=\"ruby-comment cmt\"># File /Users/ged/source/ruby/Arrow/lib/arrow/htmltokenizer.rb, line 64</span>\n\
      64:         <span class=\"ruby-keyword kw\">def</span> <span class=\"ruby-identifier\">each</span>\n\
      65:             <span class=\"ruby-ivar\">@scanner</span>.<span class=\"ruby-identifier\">reset</span>\n\
      66: \n\
      67:             <span class=\"ruby-keyword kw\">until</span> <span class=\"ruby-ivar\">@scanner</span>.<span class=\"ruby-identifier\">empty?</span>\n\
      68:                 <span class=\"ruby-keyword kw\">if</span> <span class=\"ruby-ivar\">@scanner</span>.<span class=\"ruby-identifier\">peek</span>(<span class=\"ruby-value\">1</span>) <span class=\"ruby-operator\">==</span> <span class=\"ruby-value str\">'&lt;'</span>\n\
      69:                     <span class=\"ruby-identifier\">tag</span> = <span class=\"ruby-ivar\">@scanner</span>.<span class=\"ruby-identifier\">scan_until</span>( <span class=\"ruby-regexp re\">/&gt;/</span> )\n\
      70: \n\
      71:                     <span class=\"ruby-keyword kw\">case</span> <span class=\"ruby-identifier\">tag</span>\n\
      72:                     <span class=\"ruby-keyword kw\">when</span> <span class=\"ruby-regexp re\">/^&lt;!--/</span>\n\
      73:                         <span class=\"ruby-identifier\">token</span> = <span class=\"ruby-constant\">HTMLComment</span>.<span class=\"ruby-identifier\">new</span>( <span class=\"ruby-identifier\">tag</span> )\n\
      74:                     <span class=\"ruby-keyword kw\">when</span> <span class=\"ruby-regexp re\">/^&lt;!/</span>\n\
      75:                         <span class=\"ruby-identifier\">token</span> = <span class=\"ruby-constant\">DocType</span>.<span class=\"ruby-identifier\">new</span>( <span class=\"ruby-identifier\">tag</span> )\n\
      76:                     <span class=\"ruby-keyword kw\">when</span> <span class=\"ruby-regexp re\">/^&lt;\\?/</span>\n\
      77:                         <span class=\"ruby-identifier\">token</span> = <span class=\"ruby-constant\">ProcessingInstruction</span>.<span class=\"ruby-identifier\">new</span>( <span class=\"ruby-identifier\">tag</span> )\n\
      78:                     <span class=\"ruby-keyword kw\">else</span>\n\
      79:                         <span class=\"ruby-identifier\">token</span> = <span class=\"ruby-constant\">HTMLTag</span>.<span class=\"ruby-identifier\">new</span>( <span class=\"ruby-identifier\">tag</span> )\n\
      80:                     <span class=\"ruby-keyword kw\">end</span>\n\
      81:                 <span class=\"ruby-keyword kw\">else</span>\n\
      82:                     <span class=\"ruby-identifier\">text</span> = <span class=\"ruby-ivar\">@scanner</span>.<span class=\"ruby-identifier\">scan</span>( <span class=\"ruby-regexp re\">/[^&lt;]+/</span> )\n\
      83:                     <span class=\"ruby-identifier\">token</span> = <span class=\"ruby-constant\">HTMLText</span>.<span class=\"ruby-identifier\">new</span>( <span class=\"ruby-identifier\">text</span> )\n\
      84:                 <span class=\"ruby-keyword kw\">end</span>\n\
      85: \n\
      86:                 <span class=\"ruby-keyword kw\">yield</span>( <span class=\"ruby-identifier\">token</span> )\n\
      87:             <span class=\"ruby-keyword kw\">end</span>\n\
      88:         <span class=\"ruby-keyword kw\">end</span>"
    m_desc: |-
      <p>
      Enumerable interface: Iterates over parsed tokens, calling the supplied
      block with <a href="HTMLTokenizer.html#M000237">each</a> one.
      </p>
    params: () {|token| ...}
  category: Instance
  type: Public

sectitle

--- 

constants

--- 
- name: SVNRev
  desc: |+
    
    SVN Revision
    
  value: "%q$Rev: 437 $"
- name: SVNId
  desc: |+
    
    SVN Id
    
  value: "%q$Id: htmltokenizer.rb 437 2008-03-28 00:49:20Z deveiant $"

[Validate]

Generated with the Darkfish Rdoc Generator.