Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degraded performance on large data set manipulation application #800

Open
titouanfreville opened this issue May 25, 2024 · 0 comments
Open

Comments

@titouanfreville
Copy link

Hello,
I use dependency injector as a bases for my projects in python for some time now and meet an unexpected issue recently.

I am currently building a data analysis software aiming to analyse large set of data (~3Go of data for 20 millions rows) and the process takes an unexpectedly long time to run and has a larger resource consumption.

As a basis, just getting the data take ~3 minutes without injecting dependencies while its not done after 20 minutes using it.

I am mainly using singleton containers and create the base project using wire system.

The test ran on python 12 under Microsoft dev container: mcr.microsoft.com/vscode/devcontainers/python:1-3.12, and a windows server running python 12 (don't have exact version but it can be asked if needed).

I processed the data using SQLAlchemy with pyodbc driver + pandas readsql methods.

I cannot provide the dataset I'm using as its private to the company I'm working.

The application is wrapped behind a Typer client application using async method (though parallelization is not correctly done yet as I'm new to it 😇 )

Any feed back on this or idea is welcome as I don't really see why using DI could impact the code so much on this case.

Thanks for your work and time. <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant