Understanding Ruby's insertion sort algorithm

As we explore different methodologies for sorting data, we turn to the insertion sort algorithm. There are a number of reasons to like insertion sort! First, insertion sort is stable, which means that it does not change the relative order of elements with equal keys. It's also an in-place algorithm, meaning that it does not create a new array to store the sorted elements. Finally, insertion sort is a pretty simple algorithm to implement, as you'll soon see!

Why sorting is worth caring about

Ruby's default sort method is not stable, so it does not preserve the relative order of elements with the same value. Both MRI's sort and sort_by methods can act at times as if they're stable, but it's a bad idea to rely on these methods if your code requires stable sorting. Otherwise, if all you need to do is sort singular elements with no duplicate entries that need to have their order preserved, you won't often need to reach further into your sorting toolbox. While it depends on the Ruby implementation, Ruby's default sort is often a form of quicksort. It has good performance as it stands, and you can read more about it at igvita.com

But it's still important to have an understanding of various mechanisms to sort data and what the trade-offs are with each of the various methods. For example, while insertion sort isn't very useful for large datasets, it can be just fine and quite efficient for small datasets and those that are already close to being sorted. You'll learn why once we walk through the implementation. Sure, you'll often use built-in methods that your programming language of choice provides for sorting, but it just might pop up as an interview question as either a pair-program exercise or perhaps related to time complexity. Luckily, by the time we're done with this post, you will be able to both code an insertion sort and understand the time complexity with ease.

A visual representation

Before we get started coding, I highly recommend checking out the following video. It explains the insertion sort algorithm utilizing a dance, and personally, I can't get enough of it!

Insertion sort splits the input array into two lists - a sorted list and an unsorted list. The sorted list starts with the first value in the array, and the unsorted list contains everything else. The algorithm then goes through and grabs each value in the unsorted list and compares it one-by-one to each value in the sorted list until it finds the proper spot, where it then inserts the value in the sorted list.

Geeks for Geeks' insertion sorting algorithm explanation

_ A visual explanation of the insertion sort algorithm. Image credit: Geeks for Geeks_

In this example, the first value, 23, is selected and sorted to form the first sorted and unsorted lists. The first pass checks to see if the next value, 1, is less than 23. It is, so the algorithm puts it in the correct position, left of 23.

The second pass checks to see if the next value in the unsorted list, 10, is less than 23. It is, so the algorithm then checks to see if it is less than 1. It is not - so the sorting stops here, and 10 is sorted to the left of 23.

The third pass checks to see if the next value, 5, is less than 23. It is, so the algorithm then checks to see if it is less than 10. It is again, so it then must check to see if 5 is less than 1. It is not, so the correct position is now reached and the sort stops. With this example, it is easy to see how a list sorted in reverse order would take the longest time to sort - each succeeding value would have to be checked against every preceding value that had already been sorted.

Step-by-step walk-through of the code

Let's look at the code!

def insertion_sort(array)
    for i in 1...(array.length)  # Step 1
        j = i # Step 2
        while j > 0 # Step 3
            if array[j-1] > array[j] # Step 4
                temp = array[j]
                array[j] = array[j-1]
                array[j-1] = temp
            else
                break
            end
            j = j - 1 # Step 5
        end
    end
    return array
end

Step 1:

We start with a for loop that sets variable i to 1 and continues to increment until i equals the length of our array.

Step 2:

We create another variable j and initialize it with a value of 1 (since that is what i is).

Step 3:

Next, we have a nested while loop that will continue as long as j is greater than zero. Since we start with j equal to 1, we know this will execute at least once.

Step 4:

The if... else block probably looks scary at first, but don't worry. It will make sense once we walk through it (and you can always revisit the dance YouTube example!).

For the if condition, we check whether [j-1] is greater than array[j]. Since j is currently 1, we would essentially be comparing array[0] with array[1]. This makes sense because we're comparing the first two elements of the array.

If the first element (array[0]) is larger than the next one (array[1]), then of course we need to swap, which is what happens within the if block. However, if the value in array[0] is less than the value in array[1], then great! We don't need to do anything because it's already sorted, so we simply hit the break in the else block.

Step 5:

From here, we decrement j. Now, we're back to the for loop, and i is now going to be 2. You can imagine how we'll be comparing array[1] with array[2] for the first iteration within the while loop, and then we'll actually go through the while loop again because our j started at 2 vs. 1.

An insertion sort example with real data

Let's walk through this code using the following example array: [5,7,2,10,9,12]

In the first iteration, we'll be comparing 5 and 7. Since 5 < 7, we quickly break out of the if/else and move on.

In the next iteration, we compare 7 and 2. Now, these values will need to be swapped, so we'll have [5, 2, 7, 10, 9, 12]. Then, we'll swap the 2 again with the 5 to end up with [2, 5, 7, 10, 9, 12].

In the next iteration within the for loop, we'll compare 10 and 7 -- Yay! They're already in order.

Moving on, we compare 10 and 9 and find that we need to swap. Then, 7 is less than 9, so we won't have to perform any other swaps. We're now left with [2, 5, 7, 9, 10, 12].

The last iteration finds 12, which is greater than 10, so voila! We're done and sorted.

Performance analysis

While some of the other sorting algorithms we've heard of, namely bubble sort, are very rarely practiced in real life, the insertion sort algorithm can be a reasonable solution. Imagine if our array was already sorted -- the insertion sort would run very quickly and efficiently. On the flip side, what happens if we need to sort an array that was in reverse order? That'd be a nightmare situation for insertion sort - the worst-case complexity.

If the array is already sorted, the insertion sort code will run at O(n) since it will only have to loop through n times. If you want to bear this out, add a puts i at the top of the method and run the program passing in an already-sorted array.

If the array is reverse-sorted, the insertion sort code will run at O(n^2) You might be able to visualize this in your head. Since it will have to make consecutive swaps, it will hit the if condition for every single element. Yikes! Again, feel free to try this out by passing in a reverse-sorted array and creating a counter variable that is printed out.

Although the worst-case complexity is O(n^2) which, as you may recall, is the same for bubble sort and selection sort, insertion sort is usually preferable. This is because, as we've seen, the best case could be O(n), whereas the best case for selection sort is O(n^2). Insertion sort also has fewer swaps than bubble sort, so it wins this battle.

The best case for an insertion sort algorithm — and beyond

The place where the insertion sort algorithm truly shines is for data sets that are already mostly sorted. When correcting human errors in data entry, insertion sort will only need to shift around a few points of data - and it will do so with minimal computational requirement. However, the larger the dataset, the less effective insertion sort becomes, as every value does need to be checked at least once.

I hope that this post has been helpful and that you feel confident about understanding the pros and cons of insertion sort, as well as how the algorithm functions. If you're still itching for more, I recommend checking out the wikipedia page for more on the insertion sort algorithm.

If you want to read more about other methods of sorting in Ruby, check out our articles on bubble sort, selection sort, and merge sort.

Understanding the insertion sort algorithm in Ruby