I thought it'd be fun to follow yesterday's article about regex conditionals by looking at some other neat tricks you can do with regular expressions in ruby.
Splitting strings via regular expression
You're probably quite familiar with splitting strings using a text delimiter:
"one,two".split(",")
# => ["one", "two"]
But did you know that split will also accept a regular expression?
# use `,` and `-` as delimiters
"one,two-three".split(/,|-/)
=> ["one", "two", "three"]
# Use comma as thousands separator for US currency,
# but use a space for Euros
"1,000USD".split /(?=.*(USD))(?(1),| )/
Capturing delimiters
Here's a neat party trick. Normally, when you split a string, the delimiters go away:
# The commas vanish!
"one,two".split(",")
# => ["one", "two"]
But if you use a regular expression and you put the delimiter inside of a group, split
will capture the delimiter as well.
"one,two-three".split(/(,|-)/)
=> ["one", ",", "two", "-", "three"]
The reason this happens is that split
actually splits the string at the boundary of each capture group.
Abusing split
You can abuse split
to make it behave almost like match
. In the code below, I'm using four groups in the regular expression to split the string into 4 pieces.
"1-800-555-1212".split(/(1)-(\d{3})-(\d{3})-(\d{4})/)
=> ["", "1", "800", "555", "1212"]
The first item in the resulting array is an empty string because the regular expression matches the entire source string.
Global matching
By default, regular expressions will only match a pattern once. In the code below, we only get one match even though there are five possible matches.
"12345".match /\d/
=> #<MatchData "1">
In other languages like Perl, the solution would be to flag the regular expression as "global". Ruby doesn't have that option, but it does have the String#scan
method.
The scan
method returns an array containing all matches:
"12345".scan /\d/
=> ["1", "2", "3", "4", "5"]
It even has a handy block syntax:
"12345".scan /\d/ do |i|
puts i
end
Unfortunately, there doesn't seem to be any way to scan a string lazily. So this technique probably isn't suitable for - say - processing a 500mb file.
Scanning with groups
Now at this point I hope you're wondering what kind of weird tricks we can do by using groups in our scan.
Unfortunately, the behavior here is completely predictable and BORING. Groups result in a multi-dimensional array:
"hiho hiho".scan /(hi)(ho)/
=> [["hi", "ho"], ["hi", "ho"]]
There is one weird edge case. If you use groups, anything NOT in a group will not be returned.
"hiho hiho".scan /(hi)ho/
=> [["hi"], ["hi"]]
Shorthand
I bet you know about =~
as a way to check whether or not a regular expression matches a string. It returns the index of the character where the match begins.
"hiho" =~ /hi/
# 0
"hiho" =~ /ho/
# 2
There's another quick way to check for a match though. I'm talking about the ===
operator.
/hi/ === "hiho"
# true
When we write a === b
in ruby, we're asking "does b belong in the set defined by a". Or in this case, "does 'hiho' belong to the set of strings matched by the regex /hi/
".
The ===
operator is used internally in Ruby's case statements. That means that you can use regular expressions in case statements as well.
case "hiho"
when /hi/
puts "match"
end
The triple-equals operator can also be useful for letting your own methods accept either regular expressions or strings.
Imagine that you want to execute some code when an error happens, unless it's on a pre-configured list of classes to ignore. In the example below, we use the ===
operator to allow the user to specify either a string, or a regular expression.
def ignored?(error)
@ignored_patterns.any? { |x| x === error.class.name }
end
That's it!
To be sure, there are dozens of little tricks like this scattered throughout Ruby and Rails. If have one you particularly like, let me know!