Subversion Info

Rev
102
Last Checked In
2008-09-23 15:02:51 (6 months ago)
Checked in by
deveiant

Parent

Included Modules

WordNet::Lexicon

WordNet lexicon class - abstracts access to the WordNet lexical databases, and provides factory methods for looking up and creating new WordNet::Synset objects.

Constants

SvnId

Subversion Id

SvnRev

Subversion revision

DEFAULT_DB_ENV

The path to the WordNet BerkeleyDB Env. It lives in the directory that this module is in.

ENV_OPTIONS

Options for the creation of the Env object

ENV_FLAGS_RW

Flags for the creation of the Env object (read-write and read-only)

ENV_FLAGS_RO
(Not documented)

Attributes

env[R]

The BDB::Env object which contains the wordnet lexicon’s databases.

index_db[R]

The handle to the index table

data_db[R]

The handle to the synset data table

morph_db[R]

The handle to the morph table

Public Class Methods

new( dbenv=DEFAULT_DB_ENV, mode=:readonly ) click to toggle source

Create a new WordNet::Lexicon object that will read its data from the given dbenv (a BerkeleyDB env directory). The database will be opened with the specified mode, which can either be a numeric octal mode (e.g., 0444) or one of (:readonly, :readwrite).

     # File lib/wordnet/lexicon.rb, line 91
 91:     def initialize( dbenv=DEFAULT_DB_ENV, mode=:readonly )
 92:         @mode = normalize_mode( mode )
 93:         debug_msg "Mode is: %04o" % [ @mode ]
 94: 
 95:         envflags = 0
 96:         dbflags  = 0
 97: 
 98:         unless self.readonly?
 99:             debug_msg "Using read/write flags"
100:             envflags = ENV_FLAGS_RW
101:             dbflags = BDB::CREATE
102:         else
103:             debug_msg "Using readonly flags"
104:             envflags = ENV_FLAGS_RO
105:             dbflags = 0
106:         end
107: 
108:         debug_msg "Env flags are: %0s, dbflags are %0s" %
109:             [ envflags.to_s(2), dbflags.to_s(2) ]
110: 
111:         begin
112:             @env = BDB::Env.new( dbenv, envflags, ENV_OPTIONS )
113:             @index_db = @env.open_db( BDB::BTREE, "index", nil, dbflags, @mode )
114:             @data_db = @env.open_db( BDB::BTREE, "data", nil, dbflags, @mode )
115:             @morph_db = @env.open_db( BDB::BTREE, "morph", nil, dbflags, @mode )
116:         rescue StandardError => err
117:             msg = "Error while opening Ruby-WordNet data files: #{dbenv}: %s" % 
118:                 [ err.message ]
119:             raise err, msg, err.backtrace
120:         end
121:     end

Public Instance Methods

checkpoint( bytes=0, minutes=0 ) click to toggle source

Checkpoint the database. (BerkeleyDB-specific)

     # File lib/wordnet/lexicon.rb, line 161
161:     def checkpoint( bytes=0, minutes=0 )
162:         @env.checkpoint
163:     end
clean_logs() click to toggle source

Remove any archival logfiles for the lexicon’s database environment. (BerkeleyDB-specific).

     # File lib/wordnet/lexicon.rb, line 168
168:     def clean_logs
169:         return unless self.readwrite?
170:         self.archlogs.each do |logfile|
171:             File::chmod( 0777, logfile )
172:             File::delete( logfile )
173:         end
174:     end
close() click to toggle source

Close the lexicon’s database environment

     # File lib/wordnet/lexicon.rb, line 155
155:     def close
156:         @env.close if @env
157:     end
create_synset( word, part_of_speech ) click to toggle source

Factory method: Creates and returns a new WordNet::Synset object in this lexicon for the specified word and part_of_speech.

     # File lib/wordnet/lexicon.rb, line 279
279:     def create_synset( word, part_of_speech )
280:         return WordNet::Synset.new( self, '', part_of_speech, word )
281:     end
Also aliased as: new_synset
familiarity( word, part_of_speech, polyCount=nil ) click to toggle source

Returns an integer of the familiarity/polysemy count for word as a part_of_speech. Note that polysemy can be identified for a given word by counting the synsets returned by #lookup_synsets.

     # File lib/wordnet/lexicon.rb, line 180
180:     def familiarity( word, part_of_speech, polyCount=nil )
181:         wordkey = self.make_word_key( word, part_of_speech )
182:         return nil unless @index_db.key?( wordkey )
183:         @index_db[ wordkey ].split( WordNet::SUB_DELIM_RE ).length
184:     end
grep( text ) click to toggle source

Returns an array of compound words matching text.

     # File lib/wordnet/lexicon.rb, line 258
258:     def grep( text )
259:         return [] if text.empty?
260:         
261:         words = []
262:         
263:         # Grab a cursor into the database and fetch while the key matches
264:         # the target text
265:         cursor = @index_db.cursor
266:         rec = cursor.set_range( text )
267:         while /^#{text}/ =~ rec[0]
268:             words.push rec[0]
269:             rec = cursor.next
270:         end
271:         cursor.close
272: 
273:         return *words
274:     end
lookup_synsets( word, part_of_speech, sense=nil ) click to toggle source

Look up synsets (Wordnet::Synset objects) matching text as a part_of_speech, where part_of_speech is one of +WordNet::Noun+, +WordNet::Verb+, +WordNet::Adjective+, or +WordNet::Adverb+. Without sense, #lookup_synsets will return all matches that are a part_of_speech. If sense is specified, only the synset object that matches that particular part_of_speech and sense is returned.

     # File lib/wordnet/lexicon.rb, line 193
193:     def lookup_synsets( word, part_of_speech, sense=nil )
194:         wordkey = self.make_word_key( word, part_of_speech )
195:         pos = self.make_pos( part_of_speech )
196:         synsets = []
197: 
198:         # Look up the index entry, trying first the word as given, and if
199:         # that fails, trying morphological conversion.
200:         entry = @index_db[ wordkey ]
201: 
202:         if entry.nil? && (word = self.morph( word, part_of_speech ))
203:             wordkey = self.make_word_key( word, part_of_speech )
204:             entry = @index_db[ wordkey ]
205:         end
206: 
207:         # If the lookup failed both ways, just abort
208:         return nil unless entry
209: 
210:         # Make synset keys from the entry, narrowing it to just the sense
211:         # requested if one was specified.
212:         synkeys = entry.split( SUB_DELIM_RE ).collect {|off| "#{off}%#{pos}" }
213:         if sense
214:             return lookup_synsets_by_key( synkeys[sense - 1] )
215:         else
216:             return [ lookup_synsets_by_key(*synkeys) ].flatten
217:         end
218:     end
lookup_synsetsByOffset( *keys ) click to toggle source
lookup_synsets_by_key( *keys ) click to toggle source

Returns the WordNet::Synset objects corresponding to the keys specified. The keys are made up of the target synset’s “offset” and syntactic category catenated together with a ’%’ character.

     # File lib/wordnet/lexicon.rb, line 224
224:     def lookup_synsets_by_key( *keys )
225:         synsets = []
226: 
227:         keys.each {|key|
228:             raise WordNet::LookupError, "Failed lookup of synset '#{key}':"\
229:                 "No such synset" unless @data_db.key?( key )
230: 
231:             data = @data_db[ key ]
232:             offset, part_of_speech = key.split( /%/, 2 )
233:             synsets << WordNet::Synset.new( self, offset, part_of_speech, nil, data )
234:         }
235: 
236:         return *synsets
237:     end
Also aliased as: lookup_synsetsByOffset
morph( word, part_of_speech ) click to toggle source

Returns a form of word as a part of speech part_of_speech, as found in the WordNet morph files. The #lookup_synsets method perfoms morphological conversion automatically, so a call to #morph is not required.

     # File lib/wordnet/lexicon.rb, line 245
245:     def morph( word, part_of_speech )
246:         return @morph_db[ self.make_word_key(word, part_of_speech) ]
247:     end
new_synset( word, part_of_speech ) click to toggle source

Alias for #create_synset

readonly?() click to toggle source

Returns true if the lexicon was opened in read-only mode.

     # File lib/wordnet/lexicon.rb, line 143
143:     def readonly?
144:         ( @mode & 0200 ).nonzero? ? false : true
145:     end
readwrite?() click to toggle source

Returns true if the lexicon was opened in read-write mode.

     # File lib/wordnet/lexicon.rb, line 149
149:     def readwrite?
150:         ! self.readonly?
151:     end
remove_synset( synset ) click to toggle source

Remove the specified synset (a WordNet::Synset object) in the lexicon. Returns the offset of the stored synset.

     # File lib/wordnet/lexicon.rb, line 327
327:     def remove_synset( synset )
328:         # If it's not in the database (ie., doesn't have a real offset),
329:         # just return.
330:         return nil if synset.offset == 1
331: 
332:         # Start a transaction on the data table
333:         @env.begin( BDB::TXN_COMMIT, @data_db ) do |txn,datadb|
334: 
335:             # First remove the index entries for this synset by iterating
336:             # over each of its words
337:             txn.begin( BDB::TXN_COMMIT, @index_db ) do |txn,indexdb|
338:                 synset.words.collect {|word| word + "%" + pos }.each {|word|
339: 
340:                     # If the index contains an entry for this word, either
341:                     # splice out the offset for the synset being deleted if
342:                     # there are more than one, or just delete the whole
343:                     # entry if it's the only one.
344:                     if indexdb.key?( word )
345:                         offsets = indexdb[ word ].
346:                             split( SUB_DELIM_RE ).
347:                             reject {|offset| offset == synset.offset}
348: 
349:                         unless offsets.empty?
350:                             index_db[ word ] = newoffsets.join( SUB_DELIM )
351:                         else
352:                             index_db.delete( word )
353:                         end
354:                     end
355:                 }
356:             end
357: 
358:             # :TODO: Delete synset from pointers of related synsets
359: 
360:             # Delete the synset from the main db
361:             datadb.delete( synset.offset )
362:         end
363: 
364:         return true
365:     end
reverse_morph( word ) click to toggle source

Returns the result of looking up word in the inverse of the WordNet morph files. _(This is undocumented in Lingua::Wordnet)_

     # File lib/wordnet/lexicon.rb, line 252
252:     def reverse_morph( word )
253:         @morph_db.invert[ word ]
254:     end
store_synset( synset ) click to toggle source

Store the specified synset (a WordNet::Synset object) in the lexicon. Returns the key of the stored synset.

     # File lib/wordnet/lexicon.rb, line 287
287:     def store_synset( synset )
288:         strippedOffset = nil
289:         pos = nil
290: 
291:         # Start a transaction
292:         @env.begin( BDB::TXN_COMMIT, @data_db ) do |txn,datadb|
293: 
294:             # If this is a new synset, generate an offset for it
295:             if synset.offset == 1
296:                 synset.offset =
297:                     (datadb['offsetcount'] = datadb['offsetcount'].to_i + 1)
298:             end
299:             
300:             # Write the data entry
301:             datadb[ synset.key ] = synset.serialize
302:                 
303:             # Write the index entries
304:             txn.begin( BDB::TXN_COMMIT, @index_db ) do |txn,indexdb|
305: 
306:                 # Make word/part-of-speech pairs from the words in the synset
307:                 synset.words.collect {|word| word + "%" + pos }.each {|word|
308: 
309:                     # If the index already has this word, but not this
310:                     # synset, add it
311:                     if indexdb.key?( word )
312:                         indexdb[ word ] << SUB_DELIM << synset.offset unless
313:                             indexdb[ word ].include?( synset.offset )
314:                     else
315:                         indexdb[ word ] = synset.offset
316:                     end
317:                 }
318:             end # transaction on @index_db
319:         end # transaction on @dataDB
320: 
321:         return synset.offset
322:     end

Protected Instance Methods

archlogs() click to toggle source

Return a list of archival logfiles that can be removed safely. (BerkeleyDB-specific).

     # File lib/wordnet/lexicon.rb, line 398
398:     def archlogs
399:         return @env.log_archive( BDB::ARCH_ABS )
400:     end
make_pos( original ) click to toggle source

Normalize various ways of specifying a part of speech into the WordNet part of speech indicator from the original representation, which may be the name (e.g., “noun”); nil, in which case it defaults to the indicator for a noun; or the indicator character itself, in which case it is returned unmodified.

     # File lib/wordnet/lexicon.rb, line 377
377:     def make_pos( original )
378:         return WordNet::Noun if original.nil?
379:         osym = original.to_s.intern
380:         return WordNet::SYNTACTIC_CATEGORIES[ osym ] if
381:             WordNet::SYNTACTIC_CATEGORIES.key?( osym )
382:         return original if SYNTACTIC_SYMBOLS.key?( original )
383:         return nil
384:     end
make_word_key( word, pos ) click to toggle source

Make a lexicon key out of the given word and part of speech (pos).

     # File lib/wordnet/lexicon.rb, line 389
389:     def make_word_key( word, pos )
390:         pos = self.make_pos( pos )
391:         word = word.gsub( /\s+/, '_' )
392:         return "#{word}%#{pos}"
393:     end

Private Instance Methods

debug_msg( *msg ) click to toggle source

Output the given msg to STDERR if $DEBUG is turned on.

     # File lib/wordnet/lexicon.rb, line 423
423:     def debug_msg( *msg )
424:         return unless $DEBUG
425:         $deferr.puts msg
426:     end
normalize_mode( origmode ) click to toggle source

Turn the given origmode into an octal file mode such as that given to File.open.

     # File lib/wordnet/lexicon.rb, line 409
409:     def normalize_mode( origmode )
410:         case origmode
411:         when :readonly
412:             0444 & ~File.umask
413:         when :readwrite, :writable
414:             0666 & ~File.umask
415:         when Fixnum
416:             origmode
417:         else
418:             raise ArgumentError, "unrecognized mode %p" % [origmode]
419:         end
420:     end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.