Analyse memory in Python

Main metrics

identify python file/functions consuming most memory:It is very reliable to detect rise/peak memory.
Generally it is computed by a comparison of the process rss of before and after the execution of python code/function.

Number of objects by types: It is very reliable to know the exact number of built-in objects or custom objects but not necessarily both according to the functions/library used.

memory consumed by objects: It is an approximation of memory consumed by objects, it is not very reliable because of the way to compute the size of objects by default in python:
sys.getsizeof(object[, default])
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

There are some tricks and even libraries to « try to » compute recursively the memory occupied by an object, but it is not always very simple and reliable.
That being said, evaluating memory consumed by objects, even broadly, stays helpful because it gives an order of size and so some pointers to investigate.

Main strategies

– Add the metrics computations inside the applicative code: It is a way to favor.
These metric can be scheduled to be printed at specific interval or when the memory of the process reaches a threshold.

– Add the metrics computations at runtime by injecting the metrics computations at runtime.
Pyrasite library allows to do that.
It may be appealing but beware it is a hack, it requires to modify you kernel properties and it may have side effects on you application.

python common code used in next parts

Model:

class BopLine:
 
    def __init__(self, *kargs) -> None:
        # print(f'kargs={kargs}')
        self.series_reference = kargs[0]
        self.period = kargs[1]
        self.data_value = kargs[2]
        self.suppressed = kargs[3]
        self.status = kargs[4]
        self.units = kargs[5]
        self.magntude = kargs[6]
        self.subject = kargs[7]
        self.group = kargs[8]
        self.series_title_1 = kargs[9]

Service:

import csv
from pathlib import Path
from time import sleep
from typing import List
 
from foo_app.core.BopLine import BopLine
 
 
class PaymentHighMemoryConsumptionService:
 
    def countSubjectWithCustomObjects(self, subject: str):
        bop_lines: list[BopLine] = self.readLinesWithCustomObject()
        matching_lines = [b for b in bop_lines if b.subject == subject]
        return len(matching_lines)
 
    def countSubjectWithStringObjects(self, subject: str):
        rows: List[List[str]] = self.readLinesWithoutCustomObject()
        matching_lines = [r for r in rows if r[-3] == subject]
        return len(matching_lines)
 
    def readLinesWithCustomObject(self) -> List[BopLine]:
        with open(Path(__file__).parent.parent / 'balance-of-payments-september-2022.csv') as f:
            reader = csv.reader(f, delimiter=",", quotechar='"')
            next(reader, None)  # skip the headers
            bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
            # We simulate a long processing to be able to catch the memory consumption
            sleep(35)
            print(f'readLinesWithCustomObject() will return')
            return bop_lines
 
    def readLinesWithoutCustomObject(self) -> List[List[str]]:
        with open(Path(__file__).parent.parent / 'balance-of-payments-september-2022.csv') as f:
            reader = csv.reader(f, delimiter=",", quotechar='"')
            next(reader, None)  # skip the headers
            rows: List[List[str]] = [row for row in reader]
            # We simulate a long processing to be able to catch the memory consumption
            sleep(35)
            print(f'readLinesWithoutCustomObject() will return')
            return rows

Tools to compute the number of objects and memory consumed by them

They are multiple functions, libraries to compute the memory used by an object in python.
If you have custom objects that holds build-in or custom objects, don’t rely on sys.getsizeof()
Concretely the memory consumed will appear to be inferior to which one of the process.
Warning: Some memory analyzer libraries use sys.getsizeof() function for some processing (ex:Pympler).
The single metric we can rely on with exactitude is the count of objects.

Pympler: Number of objects and memory consumed by them

TLDR:
Metric reliable for the number of objects but tricky for actual consumed memory by custom objects.

Here, an example to demonstrate that.
We will read lines from a large csv file (198 787 lines) and we will do that in 2 ways:
1) with only built-in types :
– app with string object summary: The most consistent and reliable
We will see that its working perfectly.

2) with a custom type to represent a line
We will see that it is more complicated to have reliable metrics about memory used by objects.
– app with custom nested object summary:Very fast but helpful only for the number of objects
– app with custom nested object asizeof and summary:Potentially slow, helpful for the number of objects and a little helpful for the memory allocated by objects
– app with custom nested object asizeof and stats:Unclear results

Each program will output the metrics according to a specific feature provided by the library.

app with string object summary

The most consistent and reliable

Application:

from threading import Thread
 
from foo_app.core.PaymentHighMemoryConsumptionService import PaymentHighMemoryConsumptionService
from foo_app.memory_helper.memory_helper_pympler_summary import memory_watcher_loop_summary
 
 
def main():
    t_memory_watcher = Thread(daemon=True, target=memory_watcher_loop_summary)
    t_memory_watcher.start()
    payment_service = PaymentHighMemoryConsumptionService()
    bop_subject = 'Balance of Payments - BOP'
 
    # 213872 str with only 1 subject column stored in list of string variable
    count = payment_service.countSubjectWithStringObjects(bop_subject)
    print(f'count for "{bop_subject}" = {count}')
 
 
if __name__ == '__main__':
    main()

Memory helper

import os
from threading import Thread
from time import sleep
 
from psutil import Process
from pympler import muppy
from pympler import summary
 
 
def memory_watcher_loop_summary():
    process: Process = Process(os.getpid())
    t = Thread(target=lambda p: display_rss_always(p), args=[process])
    t.start()
    while True:
        rss_mb = compute_rss_mb(process)
        if rss_mb >= 0:
            all_objects = muppy.get_objects()
            sum1 = summary.summarize(all_objects)
            summary.print_(sum1,limit=5)
        sleep(5)
 
 
def display_rss_always(process):
    while True:
        rss_mb = compute_rss_mb(process)
        print(f'rss_mb={rss_mb}')
        sleep(5)
 
 
def compute_rss_mb(process):
    return process.memory_info().rss / 1024 / 1024

Output:

/home/david/python-workspace/python_blog_examples/memory_analyze/.venv/bin/python /home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_string_object_summary.py
rss_mb=13.84765625
  types |   # objects |   total size
======= | =========== | ============
   dict |        2366 |    996.28 KB
    str |        8658 |    955.05 KB
   type |         674 |    520.55 KB
   code |        2688 |    464.56 KB
  tuple |        2300 |    129.13 KB
	rss_mb=153.83203125
rss_mb=171.578125
  types |   # objects |   total size
======= | =========== | ============
    str |     1300179 |     87.57 MB
   list |      200886 |     36.86 MB
   dict |        2367 |    996.38 KB
   type |         674 |    520.55 KB
   code |        2689 |    464.80 KB
rss_mb=171.578125
rss_mb=213.19921875
rss_mb=214.06640625
  types |   # objects |   total size
======= | =========== | ============
    str |     1302062 |     87.69 MB
   list |      202771 |     49.93 MB
   dict |        2367 |    996.38 KB
   type |         674 |    520.55 KB
   code |        2689 |    464.87 KB
rss_mb=214.06640625
rss_mb=347.90625
readLinesWithoutCustomObject() will return
count for "Balance of Payments - BOP" = 121327

Analysis
pympler detects 1300179 str(after the execution).
With 5-8 columns for 198787 lines, we get which is expected.

    198787 * 5 =  993935
    198787 * 8 =  1590296

The rss memory computed for the process is quite close form the total size of detected objects:
~= 140mb for pympler versus 171mb-213mb for rss process.
We have certainly some memory reserved by the python/native objects that may explain this small difference.

app with custom nested object summary

Very fast but helpful only for the number of objects

Application code:

from threading import Thread
 
from foo_app.core.PaymentHighMemoryConsumptionService import PaymentHighMemoryConsumptionService
from foo_app.memory_helper.memory_helper_pympler_summary import memory_watcher_loop_summary
 
 
def main():
    t_memory_watcher = Thread(daemon=False, target=memory_watcher_loop_summary)
    t_memory_watcher.start()
    payment_service = PaymentHighMemoryConsumptionService()
    bop_subject = 'Balance of Payments - BOP'
 
    count = payment_service.countSubjectWithCustomObjects(bop_subject)
    print(f'count for "{bop_subject}" = {count}')
 
 
if __name__ == '__main__':
    main()

Memory helper
It is the same we used in the previous case.

Output:

    /home/david/python-workspace/python_blog_examples/memory_analyze/.venv/bin/python /home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_summary.py
    rss_mb=14.5703125
      types |   # objects |   total size
    ======= | =========== | ============
       dict |        3176 |      1.08 MB
        str |        8668 |    955.79 KB
       type |         674 |    520.68 KB
       code |        2687 |    464.39 KB
      tuple |        2293 |    128.67 KB
    	rss_mb=153.74609375
                             types |   # objects |   total size
    ============================== | =========== | ============
                              dict |      201150 |     28.27 MB
      foo_app.core.BopLine.BopLine |      198787 |      9.10 MB
                              list |        2098 |      1.98 MB
                               str |       10546 |      1.06 MB
                              type |         674 |    520.68 KB
    rss_mb=175.3515625
    rss_mb=165.2734375
                             types |   # objects |   total size
    ============================== | =========== | ============
                              dict |      201150 |     28.27 MB
      foo_app.core.BopLine.BopLine |      198787 |      9.10 MB
                              list |        3978 |      5.67 MB
                               str |       12424 |      1.18 MB
                              type |         674 |    520.68 KB
    rss_mb=165.2734375
                             types |   # objects |   total size
    ============================== | =========== | ============
                              dict |      201150 |     28.27 MB
                              list |        5858 |      9.36 MB
      foo_app.core.BopLine.BopLine |      198787 |      9.10 MB
                               str |       14302 |      1.31 MB
                              type |         674 |    520.68 KB
    rss_mb=206.70703125
    rss_mb=210.859375
                             types |   # objects |   total size
    ============================== | =========== | ============
                              dict |      201150 |     28.27 MB
                              list |        7738 |     13.06 MB
      foo_app.core.BopLine.BopLine |      198787 |      9.10 MB
                               str |       16180 |      1.44 MB
                              type |         674 |    520.68 KB
    rss_mb=212.53125
    readLinesWithCustomObject() will return
    count for "Balance of Payments - BOP" = 121327

Analysis
pympler detects 198787 foo_app.BopLine.BopLine objects, which is consistent with the number of lines of the csv file.
But we can notice 1 big problem:
– The memory occupied by these objects is very small: 9.10 MB
The discrepancy is related to the sys.getsizeof() used by pympler for summarize() function.
we can notice that the number of string is smaller in this case (14302).
It is expected because we don’t store string variable directly in the code: custom objects hold them.

Here, the difference between the memory of objects detected by pympler and the memory displayed by rss is problematic.
~=50mb for pympler versus 171mb-213mb for rss process.

app with custom nested object asizeof and summary

Potentially slow, helpful for the number of objects and a little helpful for the memory allocated by objects

Here, I rewrote the summary.summarize() function by using asizeof.asizeof() that performs a recursive computation instead of sys .getsizeof()

Application code:

from threading import Thread
 
from foo_app.core.PaymentHighMemoryConsumptionService import PaymentHighMemoryConsumptionService
from foo_app.memory_helper.memory_helper_pympler_asizeof_and_summary import \
    memory_watcher_loop_asizeof_and_summary
 
 
def main():
    t_memory_watcher = Thread(daemon=False, target=memory_watcher_loop_asizeof_and_summary)
    t_memory_watcher.start()
    payment_service = PaymentHighMemoryConsumptionService()
    bop_subject = 'Balance of Payments - BOP'
 
    count = payment_service.countSubjectWithCustomObjects(bop_subject)
    print(f'count for "{bop_subject}" = {count}')
 
 
if __name__ == '__main__':
    main()

Memory helper

import os
from threading import Thread
from time import sleep
 
from psutil import Process
from pympler import asizeof
from pympler import muppy
from pympler import summary
 
 
def summarize_with_asizeof(objects):
    count = {}
    total_size = {}
    for o in objects:
        otype = summary._repr(o)
        if otype in count:
            count[otype] += 1
            total_size[otype] += asizeof.asizeof(o)
        else:
            count[otype] = 1
            total_size[otype] = asizeof.asizeof(o)
    rows = []
    for otype in count:
        rows.append([otype, count[otype], total_size[otype]])
    return rows
 
 
def memory_watcher_loop_asizeof_and_summary():
    process: Process = Process(os.getpid())
    t = Thread(target=lambda p: display_rss_always(p),args=[process])
    t.start()
    while True:
        rss_mb = compute_rss_mb(process)
        if rss_mb >= 0:
            all_objects = muppy.get_objects()
            print(f'compute the summarise')
            sum1 = summarize_with_asizeof(all_objects)
            summary.print_(sum1,limit=5)
        sleep(5)
 
 
def display_rss_always(process):
    while True:
        rss_mb = compute_rss_mb(process)
        print(f'rss_mb={rss_mb}')
        sleep(5)
 
 
def compute_rss_mb(process):
    return process.memory_info().rss / 1024 / 1024

Output:

/home/david/python-workspace/python_blog_examples/memory_analyze/.venv/bin/python /home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_asizeof_and_summary.py
rss_mb=14.52734375
compute the summarise
                         types |   # objects |   total size
============================== | =========== | ============
                          list |         215 |     27.95 MB
                          dict |        3020 |      6.44 MB
                           str |        8668 |    977.35 KB
  foo_app.core.BopLine.BopLine |         657 |    922.99 KB
                         tuple |        2293 |    334.05 KB
rss_mb=155.09375
compute the summarise
rss_mb=189.9296875
rss_mb=189.9296875
rss_mb=189.9296875
rss_mb=189.9296875
rss_mb=272.42578125
rss_mb=211.16796875
readLinesWithCustomObject() will return
count for "Balance of Payments - BOP" = 121327
rss_mb=281.03515625
rss_mb=200.07421875
rss_mb=200.07421875
rss_mb=200.07421875
                         types |   # objects |   total size
============================== | =========== | ============
  foo_app.core.BopLine.BopLine |      198787 |    276.94 MB
                          dict |      201198 |    273.41 MB
                          list |        2100 |    259.01 MB
                           str |       10548 |      1.09 MB
                         tuple |        2296 |    334.97 KB
rss_mb=200.07421875

Analysis
pympler detects 198787 foo_app.BopLine.BopLine objects, which is consistent with the number of lines of the csv file.
Besides, the high size for BopLine objects spots the real memory consumption.
But we can notice 2 problems:
– The memory occupied by these objects is too big: 276.94 MB.
– occupations by objects may be replicated in contained/container objects , so we should not sum all these lines to have the actual memory used but in our case just take the biggest:
foo_app.core.BopLine.BopLine | 198787 | 276.94 MB

app with custom nested object asizeof and stats

Unclear results

Application code:

<pre lang='java' escaped='true'>
import os
from threading import Thread
from time import sleep
 
from psutil import Process
from pympler import asizeof
from pympler import muppy
from pympler import summary
 
 
def summarize_with_asizeof(objects):
    count = {}
    total_size = {}
    for o in objects:
        otype = summary._repr(o)
        if otype in count:
            count[otype] += 1
            total_size[otype] += asizeof.asizeof(o)
        else:
            count[otype] = 1
            total_size[otype] = asizeof.asizeof(o)
    rows = []
    for otype in count:
        rows.append([otype, count[otype], total_size[otype]])
    return rows
 
 
def memory_watcher_loop_asizeof_and_summary():
    process: Process = Process(os.getpid())
    t = Thread(target=lambda p: display_rss_always(p),args=[process])
    t.start()
    while True:
        rss_mb = compute_rss_mb(process)
        if rss_mb &gt;= 0:
            all_objects = muppy.get_objects()
            print(f'compute the summarise')
            sum1 = summarize_with_asizeof(all_objects)
            summary.print_(sum1,limit=5)
        sleep(5)
 
 
def display_rss_always(process):
    while True:
        rss_mb = compute_rss_mb(process)
        print(f'rss_mb={rss_mb}')
        sleep(5)
 
 
def compute_rss_mb(process):
    return process.memory_info().rss / 1024 / 1024
</pre>

Memory helper:

import inspect
import os
from threading import Thread
from time import sleep
from typing import List
 
import psutil
from psutil import Process
from pympler import asizeof
from pympler import muppy
 
 
def memory_watcher_loop_asizeof_and_stats():
    process: Process = Process(os.getpid())
    t = Thread(target=lambda p: display_rss_always(p), args=[process])
    t.start()
    while True:
        rss_mb = compute_rss_mb(process)
        if rss_mb >= 0:
            all_objects: List = muppy.get_objects()
            all_objects = [a for a in all_objects if
                           inspect.getmodule(a) not in [psutil._pslinux, psutil]]
            asizeof_asizeof = asizeof.asizeof(all_objects, stats=1.10, clip=290)
            print(f'asizeof_asizeof={asizeof_asizeof}')
        sleep(5)
 
 
def display_rss_always(process):
    while True:
        rss_mb = compute_rss_mb(process)
        print(f'rss_mb={rss_mb}')
        sleep(5)
 
 
def compute_rss_mb(process):
    return process.memory_info().rss / 1024 / 1024

Output:

/home/david/python-workspace/python_blog_examples/memory_analyze/.venv/bin/python /home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_asizeof_and_stats.py
rss_mb=14.37890625
	rss_mb=190.4921875
rss_mb=237.06640625
 
asizeof(([{'series_reference': 'BOPQ.S06AC0000000C11', 'period': '2001.12', 'data_value': '992', 'suppressed': '', 'status': 'F', 'units': 'Dollars', '....t-in method acquire of _thread.lock object at 0x7f4233566c90>, <built-in method release of _thread.lock object at 0x7f4233566c90>, deque([])],), clip=290, stats=1.1) ...
 136779648 bytes or 130.4 MiB
         8 byte aligned
         8 byte sizeof(void*)
         1 object given
   1721624 objects sized
   4663630 objects seen
       458 objects ranked
         0 objects missed
     21772 duplicates
        10 deepest recursion
 
        10 largest objects (of 458 over 1024 bytes or 1.0 KiB)
 136779648 bytes or 130.4 MiB: class list: [{'series_reference': 'BOPQ.S06AC0000000C11', 'period': '2001.12', 'data_value': '992', 'suppressed': '', 'status': 'F', 'units': 'Dollars', 'm....ilt-in method acquire of _thread.lock object at 0x7f4233566c90>, <built-in method release of _thread.lock object at 0x7f4233566c90>, deque([])] leng 33878!, ix 0
 133044464 bytes or 126.9 MiB: class list: [<foo_app.core.BopLine.BopLine object at 0x7f4233545c10>, <foo_app.core.BopLine.BopLine object at 0x7f4233545280>, <foo_app.core.BopLine.BopLin....ne object at 0x7f4221aed370>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3a0>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3d0>] leng 223641!, ix 1 (at 1), pix 0
     72856 bytes or 71.1 KiB: class dict: {'__name__': 'sys', '__doc__': "This module provides access to some objects used or maintained by the\ninterpreter and to functions that intera....e='w' encoding='utf-8'>, '_home': '/usr/bin', '__interactivehook__': <function enablerlcompleter.<locals>.register_readline at 0x7f423397c3a0>} leng 128!, ix 2 (at 1), pix 0
     42552 bytes or 41.6 KiB: class dict: {'/home/david/python-workspace/python_blog_examples/memory_analyze': FileFinder('/home/david/python-workspace/python_blog_examples/mem..../util': FileFinder('/home/david/python-workspace/python_blog_examples/memory_analyze/.venv/lib/python3.9/site-packages/pympler/util')} leng 32!, ix 3 (at 2), pix 2
     38648 bytes or 37.7 KiB: class dict: {9749792: <class 'complex' def>, 9800128: <class 'float' def>, 9781056: <class 'list' def>, 9761664: <class 'tuple' def>, 9746912: <class 'prop....eakref.WeakMethod' def>, 43551120: <class '_testcapi.HeapCTypeSubclass' def>, 43567200: <class '_testcapi.HeapCTypeSubclassWithFinalizer' def>} leng 1024!, ix 4 (at 1), pix 0
     34936 bytes or 34.1 KiB: class dict: {'646': 'ascii', 'ansi_x3.4_1968': 'ascii', 'ansi_x3_4_1968': 'ascii', 'ansi_x3.4_1986': 'ascii', 'cp367': 'ascii', 'csascii': 'ascii', 'ibm367....', 'zlib': 'zlib_codec', 'x_mac_japanese': 'shift_jis', 'x_mac_korean': 'euc_kr', 'x_mac_simp_chinese': 'gb2312', 'x_mac_trad_chinese': 'big5'} leng 512!, ix 5 (at 1), pix 0
     31616 bytes or 30.9 KiB: class dict: {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions.\nOn Unix, it supports IP (Internet Protoc....2336d3820>, 'has_dualstack_ipv6': <function has_dualstack_ipv6 at 0x7f42336d38b0>, 'create_server': <function create_server at 0x7f42336d3940>} leng 512!, ix 6 (at 1), pix 0
     30040 bytes or 29.3 KiB: class dict: {9791744: <weakref at 0x7f4233ad8db0; to 'type' at 0x956900 (type)>, 9759872: <weakref at 0x7f4233adf270; to 'type' at 0x94ec80 (weakref)>, 975....0; to 'type' at 0x9763c0 (_lru_list_elem)>, 139922306026240: <weakref at 0x7f4223841360; to 'type' at 0x7f423358cb00 (test_structmembersType)>} leng 512!, ix 7 (at 1), pix 0
     27240 bytes or 26.6 KiB: class dict: {'__name__': '_testcapi', '__doc__': None, '__package__': '', '__loader__': <_frozen_importlib_external.ExtensionFileLoader object at 0x7f42335....r'>, 'ContainerNoGC': <class '_testcapi.ContainerNoGC'>, '__file__': '/usr/lib/python3.9/lib-dynload/_testcapi.cpython-39-x86_64-linux-gnu.so'} leng 512!, ix 8 (at 1), pix 0
     25304 bytes or 24.7 KiB: class dict: {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThis exports:\n  - all functions from posix or....rap_close'>, 'fdopen': <function fdopen at 0x7f4233978b80>, '_fspath': <function _fspath at 0x7f4233978f70>, 'PathLike': <class 'os.PathLike'>} leng 512!, ix 9 (at 1), pix 0
asizeof_asizeof=136779648
rss_mb=192.734375
rss_mb=202.6640625
rss_mb=271.75390625
 
asizeof((['The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance ....dc10>, <function PaymentHighMemoryConsumptionService.readLinesWithCustomObject.<locals>.<listcomp> at 0x7f423354cb80>, [(...), [...], [...]]],), clip=290, stats=1.1) ...rss_mb=329.57421875
 140718664 bytes or 134.2 MiB
         8 byte aligned
         8 byte sizeof(void*)
         1 object given
 
   1723324 objects sized
   5093143 objects seen
       466 objects ranked
         0 objects missed
    429424 duplicates
        10 deepest recursion
 
        10 largest objects (of 466 over 1024 bytes or 1.0 KiB)
 140718664 bytes or 134.2 MiB: class list: ['The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance a....x7f4221b1dc10>, <function PaymentHighMemoryConsumptionService.readLinesWithCustomObject.<locals>.<listcomp> at 0x7f423354cb80>, [[...], [...]]] leng 482807!, ix 0
  11165832 bytes or 10.6 MiB: class list: [<foo_app.core.BopLine.BopLine object at 0x7f4233545c10>, <foo_app.core.BopLine.BopLine object at 0x7f4233545280>, <foo_app.core.BopLine.BopLin....ne object at 0x7f4221aed370>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3a0>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3d0>] leng 223641!, ix 1 (at 1), pix 0
    247688 bytes or 241.9 KiB: class list: [{'series_reference': 'BOPQ.S06AC0000000C11', 'period': '2001.12', 'data_value': '992', 'suppressed': '', 'status': 'F', 'units': 'Dollars', 'm....ilt-in method acquire of _thread.lock object at 0x7f4233566c90>, <built-in method release of _thread.lock object at 0x7f4233566c90>, deque([])] leng 33878!, ix 2 (at 1), pix 0
     72616 bytes or 70.9 KiB: class dict: {'__name__': 'sys', '__doc__': "This module provides access to some objects used or maintained by the\ninterpreter and to functions that intera....e='w' encoding='utf-8'>, '_home': '/usr/bin', '__interactivehook__': <function enablerlcompleter.<locals>.register_readline at 0x7f423397c3a0>} leng 128!, ix 3 (at 1), pix 0
     42552 bytes or 41.6 KiB: class dict: {'/home/david/python-workspace/python_blog_examples/memory_analyze': FileFinder('/home/david/python-workspace/python_blog_examples/mem..../util': FileFinder('/home/david/python-workspace/python_blog_examples/memory_analyze/.venv/lib/python3.9/site-packages/pympler/util')} leng 32!, ix 4 (at 2), pix 3
     34768 bytes or 34.0 KiB: class dict: {'646': 'ascii', 'ansi_x3.4_1968': 'ascii', 'ansi_x3_4_1968': 'ascii', 'ansi_x3.4_1986': 'ascii', 'cp367': 'ascii', 'csascii': 'ascii', 'ibm367....', 'zlib': 'zlib_codec', 'x_mac_japanese': 'shift_jis', 'x_mac_korean': 'euc_kr', 'x_mac_simp_chinese': 'gb2312', 'x_mac_trad_chinese': 'big5'} leng 512!, ix 5 (at 1), pix 0
     31616 bytes or 30.9 KiB: class dict: {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions.\nOn Unix, it supports IP (Internet Protoc....2336d3820>, 'has_dualstack_ipv6': <function has_dualstack_ipv6 at 0x7f42336d38b0>, 'create_server': <function create_server at 0x7f42336d3940>} leng 512!, ix 6 (at 1), pix 0
     29976 bytes or 29.3 KiB: class dict: {9791744: <weakref at 0x7f4233ad8db0; to 'type' at 0x956900 (type)>, 9759872: <weakref at 0x7f4233adf270; to 'type' at 0x94ec80 (weakref)>, 975....0; to 'type' at 0x9763c0 (_lru_list_elem)>, 139922306026240: <weakref at 0x7f4223841360; to 'type' at 0x7f423358cb00 (test_structmembersType)>} leng 512!, ix 7 (at 1), pix 0
     27240 bytes or 26.6 KiB: class dict: {'__name__': '_testcapi', '__doc__': None, '__package__': '', '__loader__': <_frozen_importlib_external.ExtensionFileLoader object at 0x7f42335....r'>, 'ContainerNoGC': <class '_testcapi.ContainerNoGC'>, '__file__': '/usr/lib/python3.9/lib-dynload/_testcapi.cpython-39-x86_64-linux-gnu.so'} leng 512!, ix 8 (at 1), pix 0
     25304 bytes or 24.7 KiB: class dict: {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThis exports:\n  - all functions from posix or....rap_close'>, 'fdopen': <function fdopen at 0x7f4233978b80>, '_fspath': <function _fspath at 0x7f4233978f70>, 'PathLike': <class 'os.PathLike'>} leng 512!, ix 9 (at 1), pix 0
asizeof_asizeof=140718664
rss_mb=209.8203125
readLinesWithCustomObject() will return
count for "Balance of Payments - BOP" = 121327

Analysis
pympler detects here large objects.
The output is quite hard to interpret because we don’t have a repartition by type.
We find some hints in one sample, but in another sample, the hint is different.
For example, here we get the list of BopLine objects that has a consistent size: 126.9MB:

136779648 bytes or 130.4 MiB: class list: [{'series_reference': 'BOPQ.S06AC0000000C11', 'period': '2001.12', 'data_value': '992', 'suppressed': '', 'status': 'F', 'units': 'Dollars', 'm....ilt-in method acquire of _thread.lock object at 0x7f4233566c90>, <built-in method release of _thread.lock object at 0x7f4233566c90>, deque([])] leng 33878!, ix 0
133044464 bytes or 126.9 MiB: class list: [<foo_app.core.BopLine.BopLine object at 0x7f4233545c10>, <foo_app.core.BopLine.BopLine object at 0x7f4233545280>, <foo_app.core.BopLine.BopLin....ne object at 0x7f4221aed370>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3a0>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3d0>] leng 223641!, ix 1 (at 1), pix 0

But in the next sample, we have a result much harder to understand, this size has dropped to 10.6MB :

140718664 bytes or 134.2 MiB: class list: ['The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance a....x7f4221b1dc10>, <function PaymentHighMemoryConsumptionService.readLinesWithCustomObject.<locals>.<listcomp> at 0x7f423354cb80>, [[...], [...]]] leng 482807!, ix 0
11165832 bytes or 10.6 MiB: class list: [<foo_app.core.BopLine.BopLine object at 0x7f4233545c10>, <foo_app.core.BopLine.BopLine object at 0x7f4233545280>, <foo_app.core.BopLine.BopLin....ne object at 0x7f4221aed370>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3a0>, <foo_app.core.BopLine.BopLine object at 0x7f4221aed3d0>] leng 223641!, ix 1 (at 1), pix 0

Tools to identify python file/functions consuming most memory

tracemalloc: identify python file/functions consuming most memory

Print tracebacks of top consuming memory functions

Application code

from threading import Thread
 
from foo_app.core.PaymentHighMemoryConsumptionService import PaymentHighMemoryConsumptionService
from foo_app.memory_helper.memory_helper_tracemalloc import memory_helper_tracemalloc
 
 
def main():
    t_memory_watcher = Thread(daemon=False, target=memory_helper_tracemalloc)
    t_memory_watcher.start()
    payment_service = PaymentHighMemoryConsumptionService()
    bop_subject = 'Balance of Payments - BOP'
 
    count = payment_service.countSubjectWithCustomObjects(bop_subject)
    print(f'count for "{bop_subject}" = {count}')
 
 
if __name__ == '__main__':
    main()

Memory helper

import os
import tracemalloc
from threading import Thread
from time import sleep
 
from psutil import Process
 
 
def memory_helper_tracemalloc():
    tracemalloc.start(25)
    process: Process = Process(os.getpid())
    t = Thread(target=lambda p: display_rss_always(p), args=[process])
    t.start()
    while True:
        rss_mb = compute_rss_mb(process)
        if rss_mb >= 0:
            snapshot = tracemalloc.take_snapshot()
            top_stats: list[Statistic] = snapshot.statistics('traceback')
            # pick the biggest memory block
            print_traceback(top_stats, 0)
            print_traceback(top_stats, 1)
        sleep(5)
 
 
def print_traceback(top_stats, rank: int):
    stat = top_stats[rank]
    print("rank %d, %s memory blocks: %.1f mb" % (rank, stat.count, stat.size / 1024 / 1024))
    for line in stat.traceback.format():
        print(line)
 
 
def display_rss_always(process):
    while True:
        rss_mb = compute_rss_mb(process)
        print(f'rss_mb={rss_mb}')
        sleep(5)
 
 
def compute_rss_mb(process):
    return process.memory_info().rss / 1024 / 1024

Output:

/home/david/python-workspace/python_blog_examples/memory_analyze/.venv/bin/python /home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py
rss_mb=13.3828125
rank 0, 2717 memory blocks: 0.2 mb
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 18
    main()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 13
    count = payment_service.countSubjectWithCustomObjects(bop_subject)
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 12
    bop_lines: list[BopLine] = self.readLinesWithCustomObject()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
rank 1, 713 memory blocks: 0.0 mb
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 18
    main()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 13
    count = payment_service.countSubjectWithCustomObjects(bop_subject)
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 12
    bop_lines: list[BopLine] = self.readLinesWithCustomObject()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/BopLine.py", line 5
    self.series_reference = kargs[0]
	rss_mb=327.18359375
rss_mb=632.8515625
rank 0, 1488432 memory blocks: 97.2 mb
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 18
    main()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 13
    count = payment_service.countSubjectWithCustomObjects(bop_subject)
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 12
    bop_lines: list[BopLine] = self.readLinesWithCustomObject()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
rank 1, 397573 memory blocks: 27.3 mb
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 18
    main()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/app/app_with_custom_nested_object_tracemalloc.py", line 13
    count = payment_service.countSubjectWithCustomObjects(bop_subject)
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 12
    bop_lines: list[BopLine] = self.readLinesWithCustomObject()
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/PaymentHighMemoryConsumptionService.py", line 25
    bop_lines: List[BopLine] = [BopLine(*row) for row in reader]
  File "/home/david/python-workspace/python_blog_examples/memory_analyze/foo_app/core/BopLine.py", line 5
    self.series_reference = kargs[0]
rss_mb=632.8515625

Analysis
The information related to the location of the memory rise is very clear and the memory allocation is also very consistent.
The drawback: we can see that tracemalloc consumes much memory, much more than pympler.

Ce contenu a été publié dans Non classé. Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *