WordNet lexicon class - abstracts access to the WordNet lexical databases, and provides factory methods for looking up and creating new WordNet::Synset objects.
Subversion Id
Subversion revision
The path to the WordNet BerkeleyDB Env. It lives in the directory that this module is in.
Options for the creation of the Env object
Flags for the creation of the Env object (read-write and read-only)
Create a new WordNet::Lexicon object that will read its data from the given dbenv (a BerkeleyDB env directory). The database will be opened with the specified mode, which can either be a numeric octal mode (e.g., 0444) or one of (:readonly, :readwrite).
# File lib/wordnet/lexicon.rb, line 91
91: def initialize( dbenv=DEFAULT_DB_ENV, mode=:readonly )
92: @mode = normalize_mode( mode )
93: debug_msg "Mode is: %04o" % [ @mode ]
94:
95: envflags = 0
96: dbflags = 0
97:
98: unless self.readonly?
99: debug_msg "Using read/write flags"
100: envflags = ENV_FLAGS_RW
101: dbflags = BDB::CREATE
102: else
103: debug_msg "Using readonly flags"
104: envflags = ENV_FLAGS_RO
105: dbflags = 0
106: end
107:
108: debug_msg "Env flags are: %0s, dbflags are %0s" %
109: [ envflags.to_s(2), dbflags.to_s(2) ]
110:
111: begin
112: @env = BDB::Env.new( dbenv, envflags, ENV_OPTIONS )
113: @index_db = @env.open_db( BDB::BTREE, "index", nil, dbflags, @mode )
114: @data_db = @env.open_db( BDB::BTREE, "data", nil, dbflags, @mode )
115: @morph_db = @env.open_db( BDB::BTREE, "morph", nil, dbflags, @mode )
116: rescue StandardError => err
117: msg = "Error while opening Ruby-WordNet data files: #{dbenv}: %s" %
118: [ err.message ]
119: raise err, msg, err.backtrace
120: end
121: end
Checkpoint the database. (BerkeleyDB-specific)
# File lib/wordnet/lexicon.rb, line 161
161: def checkpoint( bytes=0, minutes=0 )
162: @env.checkpoint
163: end
Remove any archival logfiles for the lexicon’s database environment. (BerkeleyDB-specific).
# File lib/wordnet/lexicon.rb, line 168
168: def clean_logs
169: return unless self.readwrite?
170: self.archlogs.each do |logfile|
171: File::chmod( 0777, logfile )
172: File::delete( logfile )
173: end
174: end
Close the lexicon’s database environment
# File lib/wordnet/lexicon.rb, line 155
155: def close
156: @env.close if @env
157: end
Factory method: Creates and returns a new WordNet::Synset object in this lexicon for the specified word and part_of_speech.
# File lib/wordnet/lexicon.rb, line 279
279: def create_synset( word, part_of_speech )
280: return WordNet::Synset.new( self, '', part_of_speech, word )
281: end
Returns an integer of the familiarity/polysemy count for word as a part_of_speech. Note that polysemy can be identified for a given word by counting the synsets returned by #lookup_synsets.
# File lib/wordnet/lexicon.rb, line 180
180: def familiarity( word, part_of_speech, polyCount=nil )
181: wordkey = self.make_word_key( word, part_of_speech )
182: return nil unless @index_db.key?( wordkey )
183: @index_db[ wordkey ].split( WordNet::SUB_DELIM_RE ).length
184: end
Returns an array of compound words matching text.
# File lib/wordnet/lexicon.rb, line 258
258: def grep( text )
259: return [] if text.empty?
260:
261: words = []
262:
263: # Grab a cursor into the database and fetch while the key matches
264: # the target text
265: cursor = @index_db.cursor
266: rec = cursor.set_range( text )
267: while /^#{text}/ =~ rec[0]
268: words.push rec[0]
269: rec = cursor.next
270: end
271: cursor.close
272:
273: return *words
274: end
Look up synsets (Wordnet::Synset objects) matching text as a part_of_speech, where part_of_speech is one of +WordNet::Noun+, +WordNet::Verb+, +WordNet::Adjective+, or +WordNet::Adverb+. Without sense, #lookup_synsets will return all matches that are a part_of_speech. If sense is specified, only the synset object that matches that particular part_of_speech and sense is returned.
# File lib/wordnet/lexicon.rb, line 193
193: def lookup_synsets( word, part_of_speech, sense=nil )
194: wordkey = self.make_word_key( word, part_of_speech )
195: pos = self.make_pos( part_of_speech )
196: synsets = []
197:
198: # Look up the index entry, trying first the word as given, and if
199: # that fails, trying morphological conversion.
200: entry = @index_db[ wordkey ]
201:
202: if entry.nil? && (word = self.morph( word, part_of_speech ))
203: wordkey = self.make_word_key( word, part_of_speech )
204: entry = @index_db[ wordkey ]
205: end
206:
207: # If the lookup failed both ways, just abort
208: return nil unless entry
209:
210: # Make synset keys from the entry, narrowing it to just the sense
211: # requested if one was specified.
212: synkeys = entry.split( SUB_DELIM_RE ).collect {|off| "#{off}%#{pos}" }
213: if sense
214: return lookup_synsets_by_key( synkeys[sense - 1] )
215: else
216: return [ lookup_synsets_by_key(*synkeys) ].flatten
217: end
218: end
Returns the WordNet::Synset objects corresponding to the keys specified. The keys are made up of the target synset’s “offset” and syntactic category catenated together with a ’%’ character.
# File lib/wordnet/lexicon.rb, line 224
224: def lookup_synsets_by_key( *keys )
225: synsets = []
226:
227: keys.each {|key|
228: raise WordNet::LookupError, "Failed lookup of synset '#{key}':"\
229: "No such synset" unless @data_db.key?( key )
230:
231: data = @data_db[ key ]
232: offset, part_of_speech = key.split( /%/, 2 )
233: synsets << WordNet::Synset.new( self, offset, part_of_speech, nil, data )
234: }
235:
236: return *synsets
237: end
Returns a form of word as a part of speech part_of_speech, as found in the WordNet morph files. The #lookup_synsets method perfoms morphological conversion automatically, so a call to #morph is not required.
# File lib/wordnet/lexicon.rb, line 245
245: def morph( word, part_of_speech )
246: return @morph_db[ self.make_word_key(word, part_of_speech) ]
247: end
Returns true if the lexicon was opened in read-only mode.
# File lib/wordnet/lexicon.rb, line 143
143: def readonly?
144: ( @mode & 0200 ).nonzero? ? false : true
145: end
Returns true if the lexicon was opened in read-write mode.
# File lib/wordnet/lexicon.rb, line 149
149: def readwrite?
150: ! self.readonly?
151: end
Remove the specified synset (a WordNet::Synset object) in the lexicon. Returns the offset of the stored synset.
# File lib/wordnet/lexicon.rb, line 327
327: def remove_synset( synset )
328: # If it's not in the database (ie., doesn't have a real offset),
329: # just return.
330: return nil if synset.offset == 1
331:
332: # Start a transaction on the data table
333: @env.begin( BDB::TXN_COMMIT, @data_db ) do |txn,datadb|
334:
335: # First remove the index entries for this synset by iterating
336: # over each of its words
337: txn.begin( BDB::TXN_COMMIT, @index_db ) do |txn,indexdb|
338: synset.words.collect {|word| word + "%" + pos }.each {|word|
339:
340: # If the index contains an entry for this word, either
341: # splice out the offset for the synset being deleted if
342: # there are more than one, or just delete the whole
343: # entry if it's the only one.
344: if indexdb.key?( word )
345: offsets = indexdb[ word ].
346: split( SUB_DELIM_RE ).
347: reject {|offset| offset == synset.offset}
348:
349: unless offsets.empty?
350: index_db[ word ] = newoffsets.join( SUB_DELIM )
351: else
352: index_db.delete( word )
353: end
354: end
355: }
356: end
357:
358: # :TODO: Delete synset from pointers of related synsets
359:
360: # Delete the synset from the main db
361: datadb.delete( synset.offset )
362: end
363:
364: return true
365: end
Returns the result of looking up word in the inverse of the WordNet morph files. _(This is undocumented in Lingua::Wordnet)_
# File lib/wordnet/lexicon.rb, line 252
252: def reverse_morph( word )
253: @morph_db.invert[ word ]
254: end
Store the specified synset (a WordNet::Synset object) in the lexicon. Returns the key of the stored synset.
# File lib/wordnet/lexicon.rb, line 287
287: def store_synset( synset )
288: strippedOffset = nil
289: pos = nil
290:
291: # Start a transaction
292: @env.begin( BDB::TXN_COMMIT, @data_db ) do |txn,datadb|
293:
294: # If this is a new synset, generate an offset for it
295: if synset.offset == 1
296: synset.offset =
297: (datadb['offsetcount'] = datadb['offsetcount'].to_i + 1)
298: end
299:
300: # Write the data entry
301: datadb[ synset.key ] = synset.serialize
302:
303: # Write the index entries
304: txn.begin( BDB::TXN_COMMIT, @index_db ) do |txn,indexdb|
305:
306: # Make word/part-of-speech pairs from the words in the synset
307: synset.words.collect {|word| word + "%" + pos }.each {|word|
308:
309: # If the index already has this word, but not this
310: # synset, add it
311: if indexdb.key?( word )
312: indexdb[ word ] << SUB_DELIM << synset.offset unless
313: indexdb[ word ].include?( synset.offset )
314: else
315: indexdb[ word ] = synset.offset
316: end
317: }
318: end # transaction on @index_db
319: end # transaction on @dataDB
320:
321: return synset.offset
322: end
Return a list of archival logfiles that can be removed safely. (BerkeleyDB-specific).
# File lib/wordnet/lexicon.rb, line 398
398: def archlogs
399: return @env.log_archive( BDB::ARCH_ABS )
400: end
Normalize various ways of specifying a part of speech into the WordNet part of speech indicator from the original representation, which may be the name (e.g., “noun”); nil, in which case it defaults to the indicator for a noun; or the indicator character itself, in which case it is returned unmodified.
# File lib/wordnet/lexicon.rb, line 377
377: def make_pos( original )
378: return WordNet::Noun if original.nil?
379: osym = original.to_s.intern
380: return WordNet::SYNTACTIC_CATEGORIES[ osym ] if
381: WordNet::SYNTACTIC_CATEGORIES.key?( osym )
382: return original if SYNTACTIC_SYMBOLS.key?( original )
383: return nil
384: end
Output the given msg to STDERR if $DEBUG is turned on.
# File lib/wordnet/lexicon.rb, line 423
423: def debug_msg( *msg )
424: return unless $DEBUG
425: $deferr.puts msg
426: end
Turn the given origmode into an octal file mode such as that given to File.open.
# File lib/wordnet/lexicon.rb, line 409
409: def normalize_mode( origmode )
410: case origmode
411: when :readonly
412: 0444 & ~File.umask
413: when :readwrite, :writable
414: 0666 & ~File.umask
415: when Fixnum
416: origmode
417: else
418: raise ArgumentError, "unrecognized mode %p" % [origmode]
419: end
420: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.