treequel

Searching With Branchsets

If you know exactly which entries you need, it’s pretty easy to fetch the corresponding Branch objects, but what if you need to search for entries matching one or more criteria?

Searching is implemented in Treequel via Treequel::Branchsets. Much like Datasets from the Sequel library which inspired Treequel, a Branchset is an object which represents an abstract set of records returned by a search. The results of the search are returned on demand, so a Branchset can be kept around and reused indefinitely.

You can construct a new Branchset via the usual constructor; it takes the Branch for the base DN of the search:

irb> Treequel::Branchset.new( dir.ou(:people) )
# => #<Treequel::Branchset:0x1a418ec base_dn='ou=people,dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=*, limit=0, timeout=0.000>
Creating a new Branchset.

There are also several convenience methods on Branch and Directory that can create a new Branchset relative to themselves, as well:

irb> dir.branchset
# => #<Treequel::Branchset:0x1a3fc54 base_dn='dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=*, limit=0, timeout=0.000>
irb> dir.ou(:people).branchset
# => #<Treequel::Branchset:0x1998314 base_dn='ou=people,dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=*, limit=0, timeout=0.000>
Creating new Branchsets relative to the base DN and ou=Hosts.

Like Sequel Datasets, Branchsets are meant to be chainable, so you can refine what entries it will find by calling one of its mutators. Each mutator method returns a new Branchset with the new criteria set. This allows you to build up a query for what you need gradually, in a concise and flexible manner.

Filter

The first of these mutators is #filter.

You can narrow the results of that search by adding one or more filter statements. Each call to #filter adds a clause to the LDAP filter string that is eventually sent to the server.

With no modifications, a Branchset will find every entry below its base using a filter of (objectClass=*) (which will match every entry).

The #filter method expects one or more expressions which are transformed into an LDAP filter, and can be a literal filter String, a Hash or an Array of criteria, or a Ruby expression.

The simplest of these, of course, is a literal LDAP filter in a String:

irb> dir.ou( :people ).filter( '(objectClass=room)' )
=> #<Treequel::Branchset:0x12b7c48 base_dn='ou=people,dc=acme,dc=com', filter=(objectClass=room), scope=subtree, select=*, limit=0, timeout=0.000>
Literal string filter expression

You can see what the equivalent filter of a Branchset is at any time using its #filter_string method:

irb> dir.ou( :people ).filter( '(objectClass=room)' ).filter_string
# => "(objectClass=room)"
Literal string filter expression

You can also use a Hash to do simple attribute=value matching:

irb> dir.ou( :people ).filter( :givenName => 'Michael' ).filter_string
# => "(givenName=Michael)"
Hash filter expression

Multiple criteria in a Hash will be ANDed together:

irb> dir.ou( :people ).filter( :givenName => 'Michael', :sn => 'Granger' )
# => "(&(givenName=Michael)(sn=Granger))"
Multi-value Hash filter expression

You can include an OR in a filter by passing :or as the first element:

irb> dir.ou( :people ).filter( :or, [:sn, 'Granger'], [:sn, 'Smith'] ).filter_string
# => "(|(sn=Granger)(sn=Smith))"
An ORed filter

or by specifying more than one value for a single attribute:

# => #<Treequel::Directory:0x4e45d5 localhost:389 (connected) base_dn="dc=acme,dc=com", bound as=anonymous, schema=(schema not loaded)>
irb> dir.ou( :people ).filter( :uid => [:mahlon, :mgranger, :jtran] ).filter_string
ORing with a Hash

You can do the same with :and and :not, and combine them, too:

irb> dir.ou( :people ).filter( :and, [:sn, 'Granger'], [:sn, 'Smith'] ).filter_string
# => "(&(sn=Granger)(sn=Smith))"
irb> dir.ou( :people ).filter( :not, [:and, [:sn, 'Granger'], [:sn, 'Smith']] ).filter_string
# => "(!(&(sn=Granger)(sn=Smith)))"
Negation (NOT) of an explicit AND

Because filter returns the mutated branchset, you can always chain them together instead of using an explicit :and.

irb> dir.ou( :people ).filter( :objectClass => 'inetOrgPerson' ).filter( :sn => 'Smith' ).filter_string
# => "(&(objectClass=inetOrgPerson)(sn=Smith))"
Chaining filter expressions ANDs them, too.

We’re experimenting with support for Sequel expressions for more-complex filter expressions, too:

# Negative 
irb> dir.ou( :people ).filter( ~:photo ).filter_string
# => "(!(photo=*))"
irb> dir.ou( :people ).filter( :employeeNumber <= 1000 ).filter_string
# => "(employeeNumber<=1000)"
irb> dir.ou( :people ).filter( :sn.like('smith') ).filter_string
# => "(sn~=smith)"
irb> dir.ou( :people ).filter( :sn.like('sm*') ).filter_string
# => "(sn=sm*)"
irb> dir.ou( :people ).filter( :sn => ['smith', 'tran'] ).filter_string
# => "(|(sn=smith)(sn=tran))"
Advanced expressions

Scope

You can also create a Branchset that will search using a different scope by passing :onelevel, :base, or :subtree (the default) to the #scope method of the original Branchset:

Setting the scope to :onelevel (as you might expect) means that it will only descend one level when searching:

irb> dir.filter( :objectClass => :organizationalUnit ).scope( :one ).collect {|branch| branch[:ou].first }
=> ["Hosts", "Groups", "Lists", "Resources", "People", "Departments", "Netgroups"]
Find all the top-level OUs

Setting it to :subtree (which is the default) means that it will descend infinitely, and setting it to :base means that it will only consider the base entry, either returning it if it matches, or returning nil if it does not.

Limit

Setting a Branchset’s #limit will limit the number of results the search will return.

irb> dir.ou( :groups ).limit( 5 ).collect {|b| b.dn }
# => ["ou=Groups,dc=acme,dc=com", "cn=anim,ou=Groups,dc=acme,dc=com", "cn=acct,ou=Groups,dc=acme,dc=com", "cn=mailuser,ou=Groups,dc=acme,dc=com", "cn=producer,ou=Groups,dc=acme,dc=com"]
Return the first 5 groups in the directory

Note that the results will be returned in directory order (at least in OpenLDAP). Until Treequel supports server-side ordering, this means that #limit is of limited usefulness; to do real paged results you need both server-side ordering and the paged results control.

We’re planning on adding a convenient way to use controls in a future release.

If you already have a Branchset with a limit, and want a new one that won’t have any limits imposed on it, you can get one via the #without_limit method.

irb> fivegroups = dir.ou( :groups ).limit( 5 )
# => #<Treequel::Branchset:0x1264908 base_dn='ou=groups,dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=*, limit=5, timeout=0.000>
irb> fivegroups.all.length
# => 5
irb> fivegroups.without_limit.all.length
# => 99
Making a Branchset without the limits of the original

Select

If you should want to limit the attributes that are returned in the entries fetched by the query, you can do so by specifying which ones should be returned with the #select method:

irb> dir.ou( :people ).select( :sn, :givenName ).limit( 5 ).collect {|b| b.entry }
# => [{"dn"=>["ou=People,dc=acme,dc=com"]}, {"givenName"=>["Reed"], "sn"=>["Slimlocke"], "dn"=>["uid=rslim,ou=People,dc=acme,dc=com"]}, {"givenName"=>["Jim"], "sn"=>["Tran"], "dn"=>["uid=jtran,ou=People,dc=acme,dc=com"]}, {"givenName"=>["Michael"], "sn"=>["Granger"], "dn"=>["uid=mgranger,ou=People,dc=acme,dc=com"]}, {"givenName"=>["Harken"], "sn"=>["Farkselstein"], "dn"=>["uid=hfarkselstein,ou=People,dc=acme,dc=com"]}]
Fetch only employee first and last names

You can get a copy of a Branchset with additional attributes by passing the additional attributes to #select_more:

irb> people_uids = dir.ou( :people ).select( :uid )
# => #<Treequel::Branchset:0x1181644 base_dn='ou=people,dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=uid, limit=0, timeout=0.000>
irb> people_uids_and_names = people_uids.select_more( :gecos )
# => #<Treequel::Branchset:0x1178b20 base_dn='ou=people,dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=uid,gecos, limit=0, timeout=0.000>
irb> people_uids_names_and_addresses = people_uids.select_more( :gecos, :homePostalAddress )
# => #<Treequel::Branchset:0x10dcb08 base_dn='ou=people,dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=uid,gecos,homePostalAddress, limit=0, timeout=0.000>
Selecting additional attributes

You can also get a copy with the select-list removed:

irb> people_uids.select_all
# => #<Treequel::Branchset:0x10da308 base_dn='ou=people,dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=*, limit=0, timeout=0.000>
Removing the selection from a branchset

Timeout

To avoid unintentional resource consumption on the server, you can specify an explicit timeout for queries. This is useful when searching with user submitted input or other untrusted sources. Note that this can only be reliably used to decrease the timeout, as the server might have a maximum timeout configured that can’t be exceeded.

irb> dir.filter('objectClass=*').timeout( 1 ).all
LDAP::ResultError: Timed out
	from ./treequel/directory.rb:328:in `search_ext2'
	from ./treequel/directory.rb:328:in `search'
	from ./treequel/branchset.rb:195:in `each'
	from (irb):8:in `all'
	from (irb):8
	from :0
A long running query

If you have a canned query that includes a timeout, you can copy it without the restriction.

irb> slow_query = dir.filter('objectClass=*').timeout( 1 )
# => #<Treequel::Branchset:0x1d5c554 base_dn='dc=acme,dc=com', filter=(objectClass=*), scope=subtree, select=*, limit=0, timeout=1.000>
irb> slow_query.all
LDAP::ResultError: Timed out
	from ./treequel/directory.rb:328:in `search_ext2'
	from ./treequel/directory.rb:328:in `search'
	from ./treequel/branchset.rb:195:in `each'
	from (irb):13:in `all'
	from (irb):13
	from :0
irb> slow_query.without_timeout.all.length
# => 4982
irb> slow_query.without_timeout.all.first
# => #<Treequel::Branch:0x1d4f2f0 dc=acme,dc=com @ localhost:389 (dc=acme,dc=com, tls, anonymous) entry={"o"=>["ACME"], "description"=>["http://www.example.com/"], "objectClass"=>["dcObject", "organization"], "dc"=>["acme"], "dn"=>["dc=acme,dc=com"]}>
A long running query with timeout disabled

Branchset Enumeration

Branchsets are also Enumerable, so you can slice and dice results with its interface:

irb> people = dir.ou( :people )
# => #<Treequel::Branch:0x11857d0 ou=people,dc=acme,dc=com @ localhost:389 (dc=acme,dc=com, tls, anonymous) entry=nil>
irb> people.all? {|person| File.directory?(person[:homeDirectory]) }
NoMethodError: undefined method `all?' for #<Treequel::Branch:0x11857d0>
	from /Users/mgranger/source/ruby/Treequel/lib/treequel/branch.rb:538:in `method_missing'
	from (irb):3
irb> people.filter( :homeDirectory ).all? {|person| File.directory?(person[:homeDirectory]) }
# => false
irb> people.filter( :homeDirectory ).find_all {|person| File.exist?(person[:homeDirectory]) && File.stat(person[:homeDirectory]).uid != person[:uidNumber] }
# => [#<Treequel::Branch:0x18287b8 uid=wwwspider,ou=People,dc=acme,dc=com @ localhost:389 (dc=acme,dc=com, tls, anonymous) entry={"cn"=>["Auth account for web spider"], "gidNumber"=>["200"], "givenName"=>["WebSpider"], "gecos"=>["WebSpider Account"], "homeDirectory"=>["/dev/null"], "sn"=>["WebSpider Account"], "uid"=>["wwwspider"], "uidNumber"=>["1500"], "objectClass"=>["top", "person", "inetOrgPerson", "posixAccount", "shadowAccount"], "dn"=>["uid=wwwspider,ou=People,dc=acme,dc=com"]}>]
Enumerating resulting Branches

For convenience, the Branchset#map method is overriden to facilitate fetching single attributes from the resulting branches:

irb> dir.ou( :hosts ).filter( :ipHostNumber ).map( :ipHostNumber ).flatten
=> ["192.168.1.253", "192.168.1.14", "192.168.1.21", "192.168.1.22", "192.168.1.23"]
Mapping branch attributes