Your guide to reducing Python memory usage

Many developers focus on developing core application functionalities and pay little or no attention to memory management until they run out of memory, and their apps start crashing, freezing, or experiencing random performance downgrades.

Computers have limited RAM, and it’s always best to make effective use of allocated resources. Trying to run a high memory-consuming app on a computer or server without enough memory could cause it to crash at inconvenient times, negatively impacting the user experience. Furthermore, a high memory footprint may also affect the performance of other apps and background services. When running a high memory-consuming app on the cloud, where resources are measured and charged for their use, you will likely end up with an expensive bill.

A high memory footprint can lead to undesirable consequences. Read on to learn what memory management entails and discover tips on lowering your application's Python memory usage.

What is memory management, and why is it important?

Memory management is a complex process involving freeing and allocating computer memory to different programs, ensuring that the system operates efficiently. Python running out of memory is a huge problem!

For example, when you launch a program, the computer has to allocate enough memory, and when the application is closed, the system frees memory and allocates it to another program.

Memory management has numerous benefits. First, it ensures that applications have the required resources to operate. The computer allocates memory to active processes and releases memory from inactive processes, which powers effective memory utilization.

Second, proper memory management contributes to system stability. Since the computer handles memory allocation automatically, applications will always have access to the required memory, which reduces issues, such as random crashes and shutdowns. Memory management techniques, such as automatic garbage collection, can assist in preventing memory leaks.

Third, memory management leads to better performance in your apps. By continuously releasing and allocating memory, applications always have access to sections of memory at similar times, which means they can quickly launch and execute.

Each application has a memory footprint, which casually refers to the amount of memory it consumes. A high memory footprint indicates an app is using a lot of memory, while a low footprint means it has low consumption.

Although computer systems can manage memory automatically, as a developer, you still have to keep your app's memory footprint in check. Memory is cheaper and more plentiful than it used to be, but that's not a license to be reckless with the resources we're given. Using memory-intensive functions and inefficient data structures could cause your software to run out of memory, freeze, and even crash.

In the following section, we’ll explain how to measure your app's memory consumption. Later, we’ll discuss tips for lowering your memory footprint.

How to measure memory usage in Python

You can use any of the following methods to measure the amount of memory your Python application is using. Each has its own merits, so we'll explore them one by one.

The psutil library

psutil is a Python library for fetching useful information about system utilization and active processes. Among other uses, the psutil library allows you to monitor memory, CPU, disk, and network usage.

To demonstrate how the psutil library works, take a look at the following Python program that checks whether an integer is a prime number.

number = int(input("Please enter a number: "))

if number == 1:
    print(num, "is not a prime number")
elif number > 1:
   # check for factors
   for i in range(2,number):
       if (number % i) == 0:
           print(number,"is not a prime number")
           print(i,"times",number//i,"is",number)
           break
   else:
       print(number,"is a prime number")

# if the input number is less than or equal to 1, it is not prime
else:
   print(number,"is not a prime number")

We can check the above program's memory footprint by importing the psutil module and adding the following function in the code.

import psutil # import Python psutil module

def memory_usage():
    process = psutil.Process()
    usage = process.memory_info().rss 
    # Using memory_info() to check consumption
    return usage # Returning the memory in bytes

When you include and run the above function, it will show that the program uses 25886KB of memory. You could use this result, even having Python print memory usage after a program executes.

Resource module

We can also use the resource module in a similar way, specifically the getrusage() function, to check the amount of memory a program is using:

import resource

def memory_usage():
    usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    return usage

The sys module

The sys module also has both getsizeof([]) and getallocatedblocks() methods, which allow you to check a Python program's memory footprint and the allocated number of memory blocks, respectively. The sys library can provide valuable insight for debugging purposes.

Here is how you can use the sys module in your Python code:

import sys

def memory_usage():
    usage = sys.getsizeof([])
    return usage

Third-party libraries

Apart from built-in functions, you can also utilize third-party libraries, such as memory_profiler, pympler, or objgraph, to measure an app's memory footprint. We've explored only the built-ins, but using a Python library to measure the memory usage of an application can give you a more helpful set of features.

Common causes of high Python memory usage

A large memory footprint in any app, including a Python app, can lead to undesirable consequences like random freezes, crashes, and, ultimately, a bad user experience. Next, we’ll go over some of the most common causes of high memory usage in the following sections.

Memory leaks

The term "memory leak" refers to a situation where memory is allocated to a particular task but is not released upon completion of the process. This means that your application is not running efficiently. The amount of available memory is also reduced significantly. Memory leaks are hard to find and compound over time, which gives them their name. They're especially harmful if the leak is in a section of code that is executed often, as each time memory is allocated without deallocation the amount of memory that has leaked from the program grows!

Memory leaks can lead to performance downgrades. Apart from your application freezing or crashing, other background services may become inoperable. Furthermore, as more apps demand memory, the computer system may be forced to close down certain processes.

External dependencies

Although third-party libraries allow us to add numerous functionalities to our applications without creating everything from scratch, they may cause high memory consumption in an app.

For example, some libraries do not free up memory spaces when a task is completed or continuously run unnecessary background processes, which strains the available resources. If you know you have a memory leak but can't find it in your code, do a look through your dependencies - they use the same memory your app uses!

Large datasets

Python is a popular programming language for data analysis, machine learning, and artificial intelligence. Training AI algorithms require a considerable amount of data and memory. If you train an AI model on a server without enough memory, it may crash or cause your training program to freeze.

Unoptimized code

Not using the garbage collector effectively, defining and storing too many objects in memory, and using the wrong datatypes could increase your app's memory footprint.

Tips to lower your app’s memory footprint

Now that we know the common causes of high memory usage and how to measure memory consumption, let's look at how to lower your app's memory footprint.

Use generators instead of lists

Although extremely useful, lists usually consume lots of memory, especially when they store many values. When the list is called, each value is loaded into memory and used by the application. Generators are like lists, but with one distinction; they support lazy loading. Thus, values stored in generators are retrieved only when needed.

Let's compare the memory consumption of lists and generators.

Here is a list that stores values between 0 and 999:

import sys

list = [i for i in range(1000)] # Stores values from 0 to 999
print(list) # We print values in the list
print(sum(list)) # We calculate the sum of the values in the list
print(f"The list consumes {sys.getsizeof(list)} bytes") # We check the amount of memory the list has taken.

When you run the above code, it shows that the list consumes about 920 bytes of memory.

In the following code sample, we use a generator instead of a list.

import sys

generatorlist = (i for i in range(1000)) # Stores values from 0 to 999
print(generatorlist) # We print values in the list
print(sum(generatorlist)) # We calculate the sum of the values in the list
print(f"The generator consumes {sys.getsizeof(generatorlist)} bytes") # We check the amount of memory the list has taken.

When the above code is executed, the generator consumes only 104 bytes of memory. Thus, generators are significantly more efficient than lists. That's a huge difference in memory consumption!

Read data in smaller chunks

As discussed, dealing with large datasets can be memory intensive. The computer has to allocate enough resources to process and store all file contents, meaning there is a chance of your application slowing down, freezing, or even crashing completely.

You can lower an app's memory footprint by reading data in smaller chunks compared to loading entire datasets in memory. This technique allows you to analyze data quickly without experiencing major performance issues.

For example, the following code is not memory efficient since we are loading our entire datasets ('employee.csv') into memory.

import pandas as pd

def readEmployeeData():
    df = pd.read_csv('employees.csv')['FIRST_NAME']
    print(df.value_counts())

We can save memory by defining a chunksize, or the number of rows our program should read from the dataset in one go, as demonstrated below.

import pandas as pd

def readDataInChunks():
    result = None
    for chunk in pd.read_csv("employees.csv", chunksize=200): #Setting the chunksize to 200 rows
        employees = chunk["FIRST_NAME"]
        chunk_result =  employees.value_counts()
        if result is None:
            result = chunk_result
        else:
            result = result.add(chunk_result, fill_value=0)

    result.sort_values(ascending=False, inplace=True)
    print(result)

readDataInChunks()

In the above code, we read and compute information from a smaller dataframe or chunk, which is more memory efficient. We save the results from the computation in a list and then proceed to analyze the next chunk of data until we've analyzed the entire dataset.

Use memory-efficient dependencies

Before importing and using a third-party library in your project, research its key features and reliability. Ask questions, such as how much memory the library uses, and determine whether there are possible memory leaks. Being involved in online tech communities, such as Stack Overflow, can help you access valuable information much faster.

Use memory-profiling tools

It's a good idea to use memory profiling tools, such as memory_profiler, valgrind, and pympler, to measure an app’s memory footprint before pushing your application to production. This step ensures you're not caught off-guard and avoid negatively impacting the user experience.

For example, let's see how we can use memory_profiler to analyze memory consumption.

Investigating Python memory usage with memory-profiler

We can simply install memory_profiler with the following command.

pip install -U memory_profiler

Once the dependency is installed, add the @profile annotation above the function you wish to analyze.

import pandas as pd

@profile #Adding the @profile annotation
def readDataInChunks():
    result = None
    for chunk in pd.read_csv("employees.csv", chunksize=20):
        employees = chunk["FIRST_NAME"]
        chunk_result =  employees.value_counts()
        if result is None:
            result = chunk_result
        else:
            result = result.add(chunk_result, fill_value=0)

    result.sort_values(ascending=False, inplace=True)
    print(result)

readDataInChunks()

We can then execute the program with the following command.

python -m memory_profiler example.py

Alongside the program's log results, you should see the following output. You can use this information to optimize certain portions of your code.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     9   56.156 MiB   56.156 MiB           1   @profile
    10                                         def readDataInChunks():
    11   56.160 MiB    0.004 MiB           1       result = None
    12   57.566 MiB    1.133 MiB           4       for chunk in pd.read_csv("employees.csv", chunksize=20):
    13   57.555 MiB    0.062 MiB           3           employees = chunk["FIRST_NAME"]
    14   57.555 MiB    0.090 MiB           3           chunk_result =  employees.value_counts()
    15   57.555 MiB    0.000 MiB           3           if result is None:
    16   57.152 MiB    0.000 MiB           1               result = chunk_result
    17                                                 else:
    18   57.562 MiB    0.121 MiB           2               result = result.add(chunk_result, fill_value=0)
    19
    20   57.570 MiB    0.004 MiB           1       result.sort_values(ascending=False, inplace=True)
    21   57.621 MiB    0.051 MiB           1       print(result)

Becoming a good steward of memory

As you work on a software project, having a low memory footprint should be at the top of your list (and not just an afterthought). Applications with low memory consumption can experience fewer crashes and freezes and, thus, improve the overall user experience.

Using generators instead of lists, avoiding memory-intensive libraries, and reading data in smaller chunks are some helpful tips for lowering your app's memory footprint. Understanding Python memory usage is critical to building applications that delight, so I hope that you take what you learned in this article into your apps!

Want to see what else we have to offer for your Django apps? Check out how Honeybadger supports Django.