Subversion Info

Rev
102
Last Checked In
2008-09-23 15:02:51 (6 months ago)
Checked in by
deveiant

Parent

Included Modules

WordNet::Synset

WordNet synonym-set object class

Synopsis

  ss = lexicon.lookupSynset( "word", WordNet::Noun, 1 )
  puts "Definition: %s" % ss.gloss
  coords = ss.coordinates

Description

Instances of this class encapsulate the data for a synonym set (‘synset’) in a Wordnet lexical database. A synonym set is a set of words that are interchangeable in some context.

Author

Michael Granger

Copyright © 2002-2008 The FaerieMUD Consortium. All rights reserved.

This module is free software. You may use, modify, and/or redistribute this software under the terms of the Perl Artistic License. (See language.perl.com/misc/Artistic.html)

Much of this code was inspired by/ported from the Lingua::Wordnet Perl module by Dan Brian.

Version

 $Id: synset.rb 102 2008-09-23 15:02:51Z deveiant $

Constants

SVNId

Subversion ID

SVNRev

Subversion Rev

Attributes

lexicon[R]

The WordNet::Lexicon that was used to look up this synset

part_of_speech[RW]

The syntactic category of this Synset. Will be one of “n” (noun), “v” (verb), “a” (adjective), “r” (adverb), or “s” (other).

offset[RW]

The original byte offset of the synset in the data file; acts as the unique identifier (when combined with #part_of_speech) of this Synset in the database.

filenum[RW]

The number corresponding to the lexicographer file name containing the synset. Calling #lexInfo will return the actual filename. See the “System Description” of wngloss(7WN) for more info about this.

wordlist[RW]

The raw list of word/lex_id pairs associated with this synset. Each word and lex_id is separated by a ’%’ character, and each pair is delimited with a ’|’. E.g., the wordlist for “animal” is:

  "animal%0|animate_being%0|beast%0|brute%1|creature%0|fauna%1"
pointerlist[RW]

The list of raw pointers to related synsets. E.g., the pointerlist for “mourning dove” is:

  "@ 01731700%n 0000|#m 01733452%n 0000"
frameslist[RW]

The list of raw verb sentence frames for this synset.

gloss[RW]

Definition and/or example sentences for the Synset.

data[R]

The raw WordNet data that represents this synset

Public Class Methods

new( lexicon, offset, pos, word=nil, data=nil ) click to toggle source

Create a new Synset object in the specified lexicon for the specified word and part_of_speech. If data is specified, initialize the synset’s other object data from it. This method shouldn’t be called directly: you should use one of the Lexicon class’s factory methods: #create_synset, #lookup_synsets, or #lookup_synsets_by_keys.

     # File lib/wordnet/synset.rb, line 126
126:     def initialize( lexicon, offset, pos, word=nil, data=nil )
127:         @lexicon = lexicon
128: 
129:         if SYNTACTIC_SYMBOLS[ pos ]
130:             @part_of_speech = SYNTACTIC_SYMBOLS[ pos ]
131:         elsif SYNTACTIC_CATEGORIES.key?(pos)
132:             @part_of_speech = pos
133:         else
134:             raise ArgumentError, "No such part of speech %p" % [ pos ]
135:         end
136: 
137:         @pointers    = nil
138: 
139:         @offset      = offset.to_i
140:         @wordlist    = word ? word : ''
141:         @data        = data
142: 
143:         @filenum     = nil
144:         @pointerlist = ''
145:         @frameslist  = ''
146:         @gloss       = ''
147: 
148:         @filenum, @wordlist, @pointerlist, @frameslist, @gloss = data.split( DELIM_RE ) if data
149:     end

Public Instance Methods

==( otherSyn ) click to toggle source

Returns true if the receiver and otherSyn are identical according to their offsets.

     # File lib/wordnet/synset.rb, line 237
237:     def ==( otherSyn )
238:         return false unless otherSyn.kind_of?( WordNet::Synset )
239:         return self.offset == otherSyn.offset
240:     end
add_words( *new_words ) click to toggle source

Add the specified new_words to this synset’s wordlist.

     # File lib/wordnet/synset.rb, line 260
260:     def add_words( *new_words )
261:         self.words |= new_words
262:     end
build_traversal_func( type, include_origin=true ) click to toggle source

Build a Proc to do recursive traversal of the specified type of relationship. It returns the synsets it traverses.

     # File lib/wordnet/synset.rb, line 528
528:     def build_traversal_func( type, include_origin=true )
529:         func = Proc.new do |syn,depth|
530:             depth ||= 0
531: 
532:             # Flag to continue traversal
533:             halt_flag = false
534: 
535:             # Call the block if it exists and we're either past the origin or
536:             # including it
537:             if block_given? && (include_origin || depth.nonzero?)
538:                 res = yield( syn, depth )
539:                 halt_flag = true if res.is_a? TrueClass
540:             end
541: 
542:             # Make an array for holding sub-synsets we see
543:             sub_syns = []
544:             sub_syns << syn unless depth.zero? && !include_origin
545: 
546:             # Iterate over each synset returned by calling the pointer on the
547:             # current syn. For each one, we call ourselves recursively, and
548:             # break out of the iterator with a false value if the block has
549:             # indicated we should abort by returning a false value.
550:             unless halt_flag
551:                 syn.send( type ).each do |subsyn|
552:                     sub_sub_syns, halt_flag = func.call( subsyn, depth + 1 )
553:                     sub_syns += sub_sub_syns
554:                     break if halt_flag
555:                 end
556:             end
557: 
558:             # return
559:             [ sub_syns, halt_flag ]
560:         end
561:         
562:         return func
563:     end
coordinates() click to toggle source

Returns an Array of the coordinate sisters of the receiver.

     # File lib/wordnet/synset.rb, line 485
485:     def coordinates
486:         self.hypernyms.collect {|syn| syn.hyponyms }.flatten
487:     end
delete_words( *old_words ) click to toggle source

Delete the specified old_words from this synset’s wordlist. Alias: delete_words.

     # File lib/wordnet/synset.rb, line 267
267:     def delete_words( *old_words )
268:         self.words -= old_words
269:     end
distance( type, otherSynset ) click to toggle source

Returns the distance in pointers between the receiver and otherSynset using type as the search path.

     # File lib/wordnet/synset.rb, line 603
603:     def distance( type, otherSynset )
604:         dist = nil
605:         self.traverse( type ) {|syn,depth|
606:             if syn == otherSynset
607:                 dist = depth
608:                 true
609:             end
610:         }
611: 
612:         return dist
613:     end
frames() click to toggle source

Returns an Array of verb frame +String+s for the synset.

     # File lib/wordnet/synset.rb, line 508
508:     def frames
509:         frarray = self.frameslist.split( WordNet::SUB_DELIM_RE )
510:         verbFrames = []
511: 
512:         frarray.each {|fr|
513:             fnum, wnum = fr.split
514:             if wnum > 0
515:                 wordtext = " (" + self.words[wnum] + ")"
516:                 verbFrames.push VERB_SENTS[ fnum ] + wordtext
517:             else
518:                 verbFrames.push VERB_SENTS[ fnum ]
519:             end
520:         }
521: 
522:         return verbFrames
523:     end
glosses() click to toggle source

Return each of the sentences of the gloss for this synset as an array. The gloss is a definition of the synset, and optionally one or more example sentences.

     # File lib/wordnet/synset.rb, line 230
230:     def glosses
231:         return self.gloss.split( /\s*;\s*/ )
232:     end
inspect() click to toggle source

Return a human-readable representation of the Synset suitable for debugging.

     # File lib/wordnet/synset.rb, line 196
196:     def inspect
197:         pointer_counts = self.pointer_map.collect {|type,ptrs|
198:             "#{type}s: #{ptrs.length}"
199:           }.join( ", " )
200: 
201:         return %q{#<%s:0x%08x/%s %s (%s): "%s" (%s)>} % [
202:             self.class.name,
203:             self.object_id * 2,
204:             self.offset,
205:             self.words.join(", "),
206:             self.part_of_speech,
207:             self.gloss,
208:             pointer_counts,
209:           ]
210:     end
key() click to toggle source

Returns the Synset’s unique identifier, made up of its offset and syntactic category catenated together with a ’%’ symbol.

     # File lib/wordnet/synset.rb, line 215
215:     def key
216:         return "%d%%%s" % [ self.offset, self.pos ]
217:     end
lexInfo=( id ) click to toggle source

Sets the “lexicographer’s file” association for this synset to id. The value in id should correspond to one of the values in #WordNet::LEXFILES

     # File lib/wordnet/synset.rb, line 500
500:     def lexInfo=( id )
501:         raise ArgumentError, "Bad index: Lexinfo id must be within LEXFILES" unless
502:             LEXFILES[id]
503:         self.filenum = id
504:     end
lex_info() click to toggle source

Return the name of the “lexicographer’s file” associated with this synset.

     # File lib/wordnet/synset.rb, line 492
492:     def lex_info
493:         return LEXFILES[ self.filenum.to_i ]
494:     end
overview() click to toggle source

Alias for #to_s

pointer_map() click to toggle source

Returns the synset’s pointers in a Hash keyed by their type.

     # File lib/wordnet/synset.rb, line 665
665:     def pointer_map
666:         return self.pointers.inject( {} ) do |hsh,ptr|
667:             hsh[ ptr.type ] ||= []
668:             hsh[ ptr.type ] << ptr
669:             hsh
670:         end
671:     end
pointers() click to toggle source

Returns the pointers in this synset’s pointerlist as an Array

     # File lib/wordnet/synset.rb, line 648
648:     def pointers
649:         @pointers ||= @pointerlist.split(SUB_DELIM_RE).collect {|pstr|
650:             Pointer.parse( pstr )
651:         }
652: 
653:         return @pointers
654:     end
pointers=( *new_pointers ) click to toggle source

Set the pointers in this synset’s pointerlist to new_pointers

     # File lib/wordnet/synset.rb, line 658
658:     def pointers=( *new_pointers )
659:         @pointerlist = new_pointers.collect {|ptr| ptr.to_s}.join( SUB_DELIM )
660:         @pointers = new_pointers
661:     end
pos() click to toggle source

The symbol which represents this synset’s syntactic category. Will be one of :noun, :verb, :adjective, :adverb, or :other.

     # File lib/wordnet/synset.rb, line 222
222:     def pos
223:         return SYNTACTIC_CATEGORIES[ @part_of_speech ]
224:     end
remove() click to toggle source

Removes this synset from the database.

     # File lib/wordnet/synset.rb, line 290
290:     def remove
291:         self.lexicon.remove_synset( self )
292:     end
search( type, otherSynset ) click to toggle source

Recursively searches all of the receiver’s pointers of the specified type for otherSynset, returning true if it is found.

     # File lib/wordnet/synset.rb, line 618
618:     def search( type, otherSynset )
619:         self.traverse( type ) {|syn,depth|
620:             syn == otherSynset
621:         }
622:     end
serialize() click to toggle source

Returns the synset’s data in a form suitable for storage in the lexicon’s database.

     # File lib/wordnet/synset.rb, line 297
297:     def serialize
298:         return [
299:             @filenum,
300:             @wordlist,
301:             @pointerlist,
302:             @frameslist,
303:             @gloss
304:           ].join( WordNet::DELIM )
305:     end
store() click to toggle source

Writes any changes made to the object to the database and updates all affected synset data and indexes. If the object passes out of scope before #store is called, the changes are lost.

     # File lib/wordnet/synset.rb, line 283
283:     def store
284:         self.lexicon.store_synset( self )
285:     end
Also aliased as: write
synonyms() click to toggle source

Alias for #words

to_s() click to toggle source

Return the synset as a string. Alias: overview.

     # File lib/wordnet/synset.rb, line 273
273:     def to_s
274:         wordlist = self.words.join(", ").gsub( /%\d/, '' ).gsub( /_/, ' ' )
275:         return "#{wordlist} [#{self.part_of_speech}] -- (#{self.gloss})"
276:     end
Also aliased as: overview
traverse( type, include_origin=true ) click to toggle source

Traversal iterator: Iterates depth-first over a particular type of the receiver, and all of the pointed-to synset’s pointers. If called with a block, the block is called once for each synset with the foundSyn and its depth in relation to the originating synset as arguments. The first call will be the originating synset with a depth of 0 unless include_origin is false. If the callback returns true, the traversal is halted, and the method returns immediately. This method returns an Array of the synsets which were traversed if no block is given, or a flag which indicates whether or not the traversal was interrupted if a block is given.

     # File lib/wordnet/synset.rb, line 577
577:     def traverse( type, include_origin=true )
578:         raise ArgumentError, "Illegal parameter 1: Must be either a String or a Symbol" unless
579:             type.kind_of?( String ) || type.kind_of?( Symbol )
580: 
581:         raise ArgumentError, "Synset doesn't support the #{type.to_s} pointer type." unless
582:             self.respond_to?( type )
583: 
584:         traversal_func = nil
585: 
586:         # Call the iterator
587:         traversal_func = self.build_traversal_func( type, include_origin )
588:         traversed_sets, halt_flag =  traversal_func.call( self )
589: 
590:         # If a block was given, just return whether or not the block was halted.
591:         if block_given?
592:             return halt_flag
593: 
594:             # If no block was given, return the traversed synsets
595:         else
596:             return traversed_sets
597:         end
598:     end
words() click to toggle source

Returns an Array of words and/or collocations associated with this synset.

     # File lib/wordnet/synset.rb, line 245
245:     def words
246:         self.wordlist.split( SUB_DELIM_RE ).collect do |word|
247:             word.gsub( /_/, ' ' ).sub( /%.*$/, '' )
248:         end
249:     end
Also aliased as: synonyms
words=( *new_words ) click to toggle source

Set the words in this synset’s wordlist to new_words

     # File lib/wordnet/synset.rb, line 254
254:     def words=( *new_words )
255:         @wordlist = new_words.join( SUB_DELIM )
256:     end
write() click to toggle source

Alias for #store

|( otherSyn ) click to toggle source

Union: Return the least general synset that the receiver and otherSynset have in common as a hypernym, or nil if it doesn’t share any.

     # File lib/wordnet/synset.rb, line 628
628:     def |( otherSyn )
629: 
630:         # Find all of this syn's hypernyms
631:         hyper_syns = self.traverse( :hypernyms )
632:         common_syn = nil
633: 
634:         # Now traverse the other synset's hypernyms looking for one of our
635:         # own hypernyms.
636:         otherSyn.traverse( :hypernyms ) do |syn,depth|
637:             if hyper_syns.include?( syn )
638:                 common_syn = syn
639:                 break true
640:             end
641:         end
642: 
643:         return common_syn
644:     end

Protected Instance Methods

fetch_synset_pointers( type, subtype=nil ) click to toggle source

Returns an Array of synset objects for the receiver’s pointers of the specified type.

     # File lib/wordnet/synset.rb, line 681
681:     def fetch_synset_pointers( type, subtype=nil )
682: 
683:         # Iterate over this synset's pointers, looking for ones that match
684:         # the type we're after. 
685:         pointers = self.pointers.
686:             find_all do |ptr|
687:                 ptr.type == type and
688:                 subtype.nil? || ptr.subtype == subtype
689:             end
690: 
691:         # 
692:         return pointers.
693:             collect {|ptr| ptr.synset }.
694:             collect {|key| @lexicon.lookup_synsets_by_key( key )}.flatten
695:     end
set_synset_pointers( type, synsets, subtype=nil ) click to toggle source

Sets the receiver’s synset pointers for the specified type to the specified synsets.

     # File lib/wordnet/synset.rb, line 700
700:     def set_synset_pointers( type, synsets, subtype=nil )
701:         synsets = [ synsets ] unless synsets.is_a?( Array )
702:         pmap = self.pointer_map
703:         pmap[ type ] = synsets
704:         self.pointers = pmap.values
705:     end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.