Deveiate

making stuff up since 1994

The Power of String#[]

There are some very powerful tools in the toolbox of Ruby’s String, and one which I don’t often see used is the index operator with a Regexp argument.

Everyone knows how to do string-extraction using a regex in the tried-and-true Perl way:

# Regexp to match an RFC2616 HTTP status lines:
HTTP_STATUS = %r|
	^
	HTTP/(\d+\.\d+)  # $1: HTTP version
	\x20
	(\d{3})          # $2: Status code
	\x20
	([^[:cntrl:]]+)  # $3: Reason
	$
|x

# An example status line
status = "HTTP/1.1 408 Request Time-out"

if status =~ HTTP_STATUS
	code = $2
else
	raise "Invalid status line!"
end

That works, of course, but it doesn’t look very Ruby-ish, uses a global variable, and takes up 5 lines. We can do better.

Enter String#[].

In addition to the usual numeric-index arguments for extracting parts of a String, you can also extract parts using a Regexp argument:

code = status[ /\d{3}/ ] or raise "no status code!"

However, this isn’t really ideal for our purpose because you can’t use placeholders or anything to anchor the match without including it in the result, that is unless you use the two-argument form. Using the original regexp, we can be sure the status code is what’s actually extracted:

code = status[ HTTP_STATUS, 2 ] or raise "no status code!"

Much better! If you’re using Ruby 1.9’s regular expressions with named captures, you can do even better:

HTTP_STATUS = %r|
	^
	HTTP/(?<http_version>\d+\.\d+)
	\x20
	(?<status_code>\d{3})
	\x20
	(?<reason>[^[:cntrl:]]+)
	$
|x

code = status[ HTTP_STATUS, :status_code ] or raise "no status code!"

I love it. You should too!