Cool Ruby regex tricks

In Ruby, regular expressions can be challenging to learn, but they're a powerful tool to parse and manipulate text in your Ruby code. There are plenty of gotchas, but Ruby regex skills will pay dividends if you take the time to practice them.

We thought it'd be fun to complement our article about regex conditionals in Ruby by looking at some other neat but uncommon tricks you can do with regexes in Ruby. Some of these tricks are Ruby-specific, but many apply to regular expressions in any language. Let's dive in.

A quick refresher on regular expressions in Ruby

In case you're not familiar with regular expressions already, we'll go over a few basics before getting into the less common tricks.

Like in many other languages, regular expressions in Ruby are defined between two forward slashes. Ruby uses the Regexp class for Ruby regular expressions, so something defined in between forward slashes is a Regexp object.

my_regex = /^h/
my_regex.class
# => Regexp

The most common use of a regular expression in Ruby is probably the .match method, which checks the regular expression against a string and returns any match.

if "This is my string".match(/my/)
  puts "We found a match"
end

We'll get into more complex uses of the match method in a bit. It's also important to understand the most common modifiers in our expressions

You can use the + modifier to match one or more of a character. You can use the * modifier to match zero or more of the character. You can also use the ? modifier to match either zero or one of a single character. And lastly, you can use the {} modifier to define a quantity range for a character.

Now that you've got the basics down, let's get into some less common tricks!

Splitting strings with a regex

You may already be quite familiar with splitting strings using a text delimiter. This is a common enough problem in Ruby that you're likely to encounter it in an interview at some point. The split method takes a string argument or a Ruby regular expression argument. You can call the split method with a string as the argument like this:

"one,two".split(",")
# => ["one", "two"]

But did you know that split will also accept a regular expression instead of a string? You call split just like you would with a string, but instead of quotes you add your Ruby regular expression (defined in between forward slashes):

# use `,` and `-` as delimiters
"one,two-three".split(/,|-/)
=> ["one", "two", "three"]

# Use comma as thousands separator for US currency,
# but use a space for Euros
"1,000USD".split /(?=.*(USD))(?(1),| )/

Greedy vs. lazy Ruby regex matching

We already went over the + and * modifiers, which let us match "1 or more" or "zero or more" respectively. By default, regex quantifiers like * and + are greedy, a term that means they match as much as possible. Take a look at this example:

"<b>this</b> <b>matcher</b>".match(/<b>.*<\/b>/)
# => #<MatchData "<b>this</b> <b>matcher</b>">

Contrast this with lazy matching, which matches as little as possible. To use lazy matching in a regular expression, use *? or +? like in this example:

"<b>this</b> <b>matcher</b>".match(/<b>.*?<\/b>/)
# => #<MatchData "<b>this</b>">

Capturing delimiters with a Ruby regex

Here's a neat party trick. Normally, when you split a string and pass a string delimiter, the delimiters go away and aren't returned in the result of the match:

# The commas vanish!
"one,two".split(",")
# => ["one", "two"]

But if you use a regular expression and you put the delimiter inside of a group, the Ruby split method will capture the delimiter as well.

"one,two-three".split(/(,|-)/)
# => ["one", ",", "two", "-", "three"]

The reason this happens is that split actually splits the string at the boundary of each capture group. This is super useful if you want to capture the match and also the delimiter!

Lookahead and lookbehind assertions

Lookaheads and lookbehinds are powerful tools when you need pattern matching without including that pattern in the match result. These two techniques allow you to assert conditions on what comes before or after your match, an uncommon but useful tool to have in your tool belt.

Positive lookaheads

A positive lookahead validates that a specific pattern (matched the expression) must follow, but it doesn’t consume it. Take a look at this example that uses ?=:

"ruby123".match(/\w+(?=\d{3})/)
# => #<MatchData "ruby">

This matches "ruby" (with no whitespace character or anything else) only if it’s followed by three digits, but the digits aren't included in the match result. The Ruby regex tries to match the entire string greedily due to the w+, but the lookahead must be true at the end result. If the regex matched the entire string, it would violate the lookahead since the matched string would be followed by no digits. "ruby" is followed by three digits, so it matches.

Negative lookaheads

You can also go the other direction and validate that a pattern does not follow with a negative lookahead. Check out this example that uses ?!:

"rubyxyz123".match(/ruby(?!\d+)/)
# =>  #<MatchData "ruby">

A Ruby regex matching example

This Ruby regex matches on the string "ruby" where the match is not followed by one or more digits. A counter example using the same regular expression will show why this is potentially useful:

"ruby123xyz".match(/ruby(?!\d+)/)
# => nil

In this example, the "ruby" match is followed by some number of digits, so we don't match on "ruby".

Let's take a quick look at lookbehinds next!

Positive and negative lookbehinds

While lookaheads validate that a pattern does or doesn't follow, lookbehinds check that a pattern does or doesn't precede the match. As you can probably guess, a positive lookbehind ensures that a pattern precedes the match:

"price: $50".match(/(?<=\$)\d+/)
# => #<MatchData "50">

At this point, you probably know what to expect from negative lookbehinds, which check that a pattern does not precede. We can swap our example above to instead check for digits not preceded by a dollar sign:

"price: $50".match(/(?<!\$)\d+/)
# => "0"

Using `split` like it's `match`

You can abuse split to make it behave almost like the match method. In the code below, I'm using four groups in the regular expression to split the string into 4 pieces.

"1-800-555-1212".split(/(1)-(\d{3})-(\d{3})-(\d{4})/)
# => ["", "1", "800", "555", "1212"]

The first item in the resulting array is an empty string because the regular expression matches the entire source string. If you wanted the array without that string, you could just drop the first element.

Global matching with a Ruby regex

By default, regular expressions will only be pattern matched once. In the code below, we only get one match even though there are five possible matches.

"12345".match /\d/
=> #<MatchData "1">

Sometimes you actually want all of the possible matches from a regular expression. In other languages like Perl, the solution would be to flag the regular expression as "global". Ruby doesn't have that option, but it does have the String#scan method.

The scan method, when called on a string, returns an array containing all matches:

"12345".scan /\d/
=> ["1", "2", "3", "4", "5"]

It even has a handy block syntax:

"12345".scan /\d/ do |i|
  puts i
end

Unfortunately, there doesn't seem to be any way to scan a string lazily. So this technique doesn't scale super well and probably isn't suitable for processing a 500 MB file.

Scanning with groups

Now at this point, I hope you're wondering why this is useful! What kind of weird tricks can we do by using groups in our scan?

Unfortunately, the behavior here is completely predictable and boring. Groups result in a multi-dimensional array:

"hiho hiho".scan /(hi)(ho)/
=> [["hi", "ho"], ["hi", "ho"]]

There is one weird edge case. If you use groups, anything NOT in a group will not be returned.

"hiho hiho".scan /(hi)ho/
=> [["hi"], ["hi"]]

Ruby regular expression shorthand

I bet you probably already know about =~ as a way to check whether or not a regular expression matches a string. In case you don't, it returns the index of the character where the match begins. Take a look at this example:

"hiho" =~ /hi/
# 0
"hiho" =~ /ho/
# 2

This is useful to get a character index, but it isn't a clean solution if what you want is actually a boolean result on whether there is or isn't a match. There's another quick way to check for a match that returns a boolean like this. Take a look at this example that uses the === operator.

/hi/ === "hiho"
# true

When we write a === b in ruby, we're asking "does b belong in the set defined by a". Or in this case, "does 'hiho' belong to the set of strings matched by the regex /hi/".

The === operator is used internally in Ruby's case statements. That means that you can use regular expressions in case statements as well.

case "hiho"
when /hi/
  puts "match"
end

The triple-equals operator can also be useful for letting your own methods accept either regular expressions or strings.

Imagine that you want to execute some code when an error happens unless that error is on a pre-configured list of classes to ignore. In the example below, we use the === operator to allow the user to specify either a string or a regular expression.

def ignored?(error)
  @ignored_patterns.any? { |x| x === error.class.name }
end

Using named capture groups for readability

Regular expressions can get really messy really fast, especially with multiple capture groups. Ruby supports named capture groups, which make regexes easier to read and work with. They can give you (and the next person who comes along to your code) an understanding of each group.

Instead of writing something like this:

/(\d{4})-(\d{2})-(\d{2})/

You can write something like this to reference matches that are found by their name:

today = "2025-02-07"
matches = today.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/)

puts matches[:year]  # "2025"
puts matches[:month] # "02"
puts matches[:day]   # "07"

Replacing text with regular expressions

The Ruby gsub method is a common choice for transforming text using regular expressions. You've probably seen it used for simple replacements like this:

"hello world".gsub(/world/, "Ruby")
# => "hello Ruby"

If you need more control, you can use a code block:

"hello world".gsub(/\w+/) { |word| word.upcase }
# => "HELLO WORLD"

Using anchors in your regular expressions

You can use anchors in your regular expressions to match only if the match is at the beginning or end of a string.

To match only at the beginning of a string, use ^ like in this example:

"hello".match(/^h/) # => #<MatchData "h">
"ehello".match(/^h/) # => nil

To match only at the end of a string, use $ like in this example:

"world".match(/d$/) # => #<MatchData "d">
"wordle".match(/d$/) # => nil

Matching only on word boundaries

Much like start and end anchors, you can use word boundaries to only match full words inside of a string. Check out this example that uses \b:

"the cat jumped over the moon".match(/\bcat\b/) # => #<MatchData "cat">
"thecat jumped over the moon".match(/\bcat\b/) # => nil

What about regex performance?

Just because you can use a Ruby regex doesn't mean you should. You should always consider performance tradeoffs. Regular expressions are powerful but slow, especially for large datasets. If you just need to check for inclusion, lean on the include string method instead. It's also a bit easier to read. Check out this example:

"hello world".include?("world")

Another thing to keep in mind is to avoid backtracking with patterns such as (.*)* so that you don't evaluate multiple characters more than you have to. You can also save a regular expression to an object with Ruby's Regexp.new. This is especially helpful if you're going to use a regular expression multiple times:

pattern = Regexp.new("world")

"hello world".match(pattern) # => #<MatchData "world">

"minecraft world".match(pattern) # => #<MatchData "world">

"I cant spell wrold".match(pattern) # => nil

What about regexes in other languages?

If you found this article useful and you write code in other languages, you might like our guides for regular expressions in Go or even in JavaScript!

What are your favorite Ruby regex tips?

Regular expressions are hard to master but incredibly powerful. To be sure, there are dozens of little tricks like this scattered throughout Ruby and Rails. If you have one you particularly like, let me know! Sign up for the Honeybadger newsletter to get more Ruby tips and tricks!