Degraded performance on large data set manipulation application #800

titouanfreville · 2024-05-25T20:35:09Z

Hello,
I use dependency injector as a bases for my projects in python for some time now and meet an unexpected issue recently.

I am currently building a data analysis software aiming to analyse large set of data (~3Go of data for 20 millions rows) and the process takes an unexpectedly long time to run and has a larger resource consumption.

As a basis, just getting the data take ~3 minutes without injecting dependencies while its not done after 20 minutes using it.

I am mainly using singleton containers and create the base project using wire system.

The test ran on python 12 under Microsoft dev container: mcr.microsoft.com/vscode/devcontainers/python:1-3.12, and a windows server running python 12 (don't have exact version but it can be asked if needed).

I processed the data using SQLAlchemy with pyodbc driver + pandas readsql methods.

I cannot provide the dataset I'm using as its private to the company I'm working.

The application is wrapped behind a Typer client application using async method (though parallelization is not correctly done yet as I'm new to it 😇 )

Any feed back on this or idea is welcome as I don't really see why using DI could impact the code so much on this case.

Thanks for your work and time. <3

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Degraded performance on large data set manipulation application #800

Degraded performance on large data set manipulation application #800

titouanfreville commented May 25, 2024

Degraded performance on large data set manipulation application #800

Degraded performance on large data set manipulation application #800

Comments

titouanfreville commented May 25, 2024