Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Rule: Prefer list comprehension over generator comprehensions to create tuples #11839

Open
Avasam opened this issue Jun 11, 2024 · 2 comments
Labels
rule Implementing or modifying a lint rule

Comments

@Avasam
Copy link

Avasam commented Jun 11, 2024

I was recently working on some bits of codes where most of my data had to be "readonly" (so I'm using immutable types like frozen dataclasses, frozensets, tuples, etc.) but also using plenty of comprehensions. Which made me wonder, since there's no "tuple comprehension" in Python, how I should be writing this code. I did a bit of performance testing, and here's the results:

import sys
from timeit import timeit

print(sys.version)
big_list = ["*"] * 99

def foo(value: str): return value

def test_list_comprehension():
    return [foo(value) for value in big_list]

def test_tuple_from_list_comprehension():
    return tuple([foo(value) for value in big_list])

def test_tuple_from_generator_comprehension():
    return tuple(foo(value) for value in big_list)

def test_unpack_generator_comprehension():
    return (*(foo(value) for value in big_list),)

print(
    "test_list_comprehension",
    timeit(test_list_comprehension),
)
print(
    "test_tuple_from_list_comprehension",
    timeit(test_tuple_from_list_comprehension),
)
print(
    "test_tuple_from_generator_comprehension",
    timeit(test_tuple_from_generator_comprehension),
)
print(
    "test_unpack_generator_comprehension",
    timeit(test_unpack_generator_comprehension),
)
3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
test_list_comprehension 6.4194597
test_tuple_from_list_comprehension 6.9672235
test_tuple_from_generator_comprehension 8.996260200000002
test_unpack_generator_comprehension 11.207814599999999
3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]
test_list_comprehension 5.656617900000128
test_tuple_from_list_comprehension 6.026029500000277
test_tuple_from_generator_comprehension 9.207803900000272
test_unpack_generator_comprehension 10.375420500000018

Unsurprisingly, the difference is even greater in 3.12 with inline list comprehension.

Because of the tuple, the generator is immediately iterated, so you get no benefit from its "lazyness". This is probably true for other stdlib collections that don't have a comprehension syntax, tuple is just the only one I can think of atm.

For this reason, I'm asking for a performance rule with an autofix that transforms code like this:

tuple(a for a in b)

into

tuple([a for a in b])

Which, unless I'm missing something, is free performance whilst staying readable and pythonic.

It seems this would fit well in the flake8-comprehensions or refurb family of rules.

@Avasam Avasam changed the title New Rule: Prefer list comprehension over generators to create tuples New Rule: Prefer list comprehension over generator comprehensions to create tuples Jun 11, 2024
@charliermarsh charliermarsh added the rule Implementing or modifying a lint rule label Jun 12, 2024
@tdulcet
Copy link

tdulcet commented Jun 12, 2024

Using your script, I see less of a difference on Linux with CPython:

3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0]
test_list_comprehension 3.4234498779999853
test_tuple_from_list_comprehension 3.7473174160000156
test_tuple_from_generator_comprehension 4.684340659999975
test_unpack_generator_comprehension 4.938177772000017
3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
test_list_comprehension 5.881427110000001
test_tuple_from_list_comprehension 6.184221321999999
test_tuple_from_generator_comprehension 6.949574859000002
test_unpack_generator_comprehension 7.213964431000001
3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
test_list_comprehension 5.159386299999994
test_tuple_from_list_comprehension 5.68906659999999
test_tuple_from_generator_comprehension 6.2835374
test_unpack_generator_comprehension 6.521585700000003
3.6.9 (default, Dec  8 2021, 21:08:43)
[GCC 8.4.0]
test_list_comprehension 5.325352799999997
test_tuple_from_list_comprehension 5.670514699999998
test_tuple_from_generator_comprehension 6.860152300000003
test_unpack_generator_comprehension 7.0944126999999995
2.7.17 (default, Feb 27 2021, 15:10:58)
[GCC 7.5.0]
('test_list_comprehension', 5.888335943222046)
('test_tuple_from_list_comprehension', 6.135804891586304)
('test_tuple_from_generator_comprehension', 6.965441942214966)

But much more of a difference with PyPy:

3.9.18 (7.3.15+dfsg-1build3, Apr 01 2024, 03:12:48)
[PyPy 7.3.15 with GCC 13.2.0]
test_list_comprehension 0.2822986920000403
test_tuple_from_list_comprehension 0.40187594900010026
test_tuple_from_generator_comprehension 0.9802658359999441
test_unpack_generator_comprehension 1.0730282659999375

@ivanychev
Copy link

I think tuple-from-list comprehension approach will lead to to 2x higher peak memory consumption, won't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rule Implementing or modifying a lint rule
Projects
None yet
Development

No branches or pull requests

4 participants