WordNet::
Lexicon class
Superclass | Object |
Included Modules | |
Extended With |
|
WordNet
lexicon class - provides access to the WordNet
lexical database, and provides factory methods for looking up words and synsets.
Creating a Lexicon
To create a Lexicon
, point it at a database using Sequel database connection criteria:
lex = WordNet::Lexicon.new( 'postgres://localhost/wordnet31' ) # => #<WordNet::Lexicon:0x7fd192a76668 postgres://localhost/wordnet31> # Another way of doing the same thing: lex = WordNet::Lexicon.new( adapter: 'postgres', database: 'wordnet31', host: 'localhost' ) # => #<WordNet::Lexicon:0x7fd192d374b0 postgres>
Alternatively, if you have the ‘wordnet-defaultdb’ gem (which includes an embedded copy of the SQLite WordNET-SQL database) installed, just call ::new
without any arguments:
lex = WordNet::Lexicon.new # => #<WordNet::Lexicon:0x7fdbfac1a358 sqlite:[...]/gems/wordnet-defaultdb-1.0.1 # /data/wordnet-defaultdb/wordnet31.sqlite>
Looking Up Synsets
Once you have a Lexicon
created, the main lookup method for Synsets is []
, which will return the first of any Synsets that are found:
synset = lex[ :language ] # => #<WordNet::Synset:0x7fdbfaa987a0 {105650820} 'language, speech' (noun): # [noun.cognition] the mental faculty or power of vocal communication>
If you want to look up all matching Synsets, use the lookup_synsets
method:
synsets = lex.lookup_synsets( :language ) # => [#<WordNet::Synset:0x7fdbfaac46c0 {105650820} 'language, speech' (noun): # [noun.cognition] the mental faculty or power of vocal # communication>, # #<WordNet::Synset:0x7fdbfaac45a8 {105808557} 'language, linguistic process' # (noun): [noun.cognition] the cognitive processes involved # in producing and understanding linguistic communication>, # #<WordNet::Synset:0x7fdbfaac4490 {106282651} 'language, linguistic # communication' (noun): [noun.communication] a systematic means of # communicating by the use of sounds or conventional symbols>, # #<WordNet::Synset:0x7fdbfaac4378 {106304059} 'language, nomenclature, # terminology' (noun): [noun.communication] a system of words used to # name things in a particular discipline>, # #<WordNet::Synset:0x7fdbfaac4260 {107051975} 'language, lyric, words' # (noun): [noun.communication] the text of a popular song or musical-comedy # number>, # #<WordNet::Synset:0x7fdbfaac4120 {107109196} 'language, oral communication, # speech, speech communication, spoken communication, spoken language, # voice communication' (noun): [noun.communication] (language) # communication by word of mouth>]
Sometime, the first Synset isn’t necessarily what you want; you want to look up a particular one. Both []
and lookup_synsets
also provide several ways of filtering or selecting synsets.
The first is the ability to select one based on its offset:
lex[ :language, 2 ] # => #<WordNet::Synset:0x7ffa78e74d78 {105808557} 'language, linguistic # process' (noun): [noun.cognition] the cognitive processes involved in # producing and understanding linguistic communication>
You can also select one with a particular word in its definition:
lex[ :language, 'sounds' ] # => #<WordNet::Synset:0x7ffa78ee01b8 {106282651} 'linguistic communication, # language' (noun): [noun.communication] a systematic means of # communicating by the use of sounds or conventional symbols>
If you’re using a database that supports using regular expressions (e.g., PostgreSQL), you can use that to select one with a matching definition:
lex[ :language, /name.*discipline/ ] # => #<WordNet::Synset:0x7ffa78f235a8 {106304059} 'language, nomenclature, # terminology' (noun): [noun.communication] a system of words used # to name things in a particular discipline>
You can also select certain parts of speech:
lex[ :right, :noun ] # => #<WordNet::Synset:0x7ffa78f30b68 {100351000} 'right' (noun): # [noun.act] a turn toward the side of the body that is on the south # when the person is facing east> lex[ :right, :verb ] # => #<WordNet::Synset:0x7ffa78f09590 {200199659} 'correct, right, rectify' # (verb): [verb.change] make right or correct> lex[ :right, :adjective ] # => #<WordNet::Synset:0x7ffa78ea8060 {300631391} 'correct, right' # (adjective): [adj.all] free from error; especially conforming to # fact or truth> lex[ :right, :adverb ] # => #<WordNet::Synset:0x7ffa78e5b2d8 {400032299} 'powerful, mightily, # mighty, right' (adverb): [adv.all] (Southern regional intensive) # very; to a great degree>
or by lexical domain, which is a more-specific part of speech (see WordNet::Synset.lexdomains.keys
for the list of valid ones):
lex.lookup_synsets( :right, 'verb.social' ) # => [#<WordNet::Synset:0x7ffa78d817e0 {202519991} 'redress, compensate, # correct, right' (verb): [verb.social] make reparations or amends # for>]
Attributes
- db R
The Sequel::Database object that model tables read from
- uri R
The database URI the lexicon will use to look up
WordNet
data
Public Class Methods
Get the Sequel URI of the default database, if it’s installed.
Create a new WordNet::Lexicon
object that will use the database connection specified by the given dbconfig
.
Public Instance Methods
Find a Word or Synset in the WordNet
database and return it. In the case of multiple matching Synsets, only the first will be returned. If you want them all, you can use lookup_synsets
instead.
The word
can be one of:
Integer
Looks up the corresponding Word or Synset by ID. This assumes that all Synset IDs are all 9 digits or greater, which is true as of WordNet
3.1. Any additional args
are ignored.
Symbol
, String
Look up a Word by its gloss using lookup_synsets
, passing any additional args
, and return the first one that is found.
Connect to the WordNet
DB and return a Sequel::Database object.
Connect to the WordNet
DB using an optional options hash.
Connect to the WordNet
DB using a connection options hash.
Connect to the WordNet
DB using a URI and an optional options hash.
Return a human-readable string representation of the Lexicon
, suitable for debugging.
Look up synsets (Wordnet::Synset objects) associated with word
, optionally filtered by additional args
.
The args can contain:
Integer
, Range
The sense/s of the Word (1-indexed) to use when searching for Synsets. If not specified, all senses of the word
are used.
Regexp
The Word’s Synsets are filtered by definition using an RLIKE filter. Note that not all databases (including the default one, sqlite3) support RLIKE.
Symbol
, String
If it matches one of either a lexical domain (e.g., “verb.motion”) or a part of speech (e.g., “adjective”, :noun, :v), the resulting Synsets are filtered by that criteria. If the doesn’t match a lexical domain or part of speech, it’s used to filter by definition using a LIKE query.