Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables the narrowing of variable types when checking a variable is "in" a collection. #17344

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Jordandev678
Copy link

Enables the narrowing of variable types when checking a variable is "in" a collection, and the collection type is a subtype of the variable type.

Fixes #3229

(Explain how this PR changes mypy.)
This PR updates the type narrowing for the "in" operator and allows it to narrow the type of a variable to the type of the collection's items - if the collection item type is a subtype of the variable (as defined by is_subtype).

Examples

def foobar(foo: Union[str, float]):
    if foo in ['a', 'b']:
        reveal_type(foo)  # N: Revealed type is "builtins.str"
    else:
        reveal_type(foo)  # N: Revealed type is "Union[builtins.str, builtins.float]"
typ: List[Literal['a', 'b']] = ['a', 'b']
x: str = "hi!"
if x in typ:
    reveal_type(x)  # N: Revealed type is "Union[Literal['a'], Literal['b']]"
else:
    reveal_type(x)  # N: Revealed type is "builtins.str"

One existing test was updated, which compared Optional[A] with "in" to (None,). Piror to this change that resulted in Union[__main__.A, None], which now narrows to None. Test cases have been added for "in", "not in", Sets, Lists, and Tuples.

I did add to the existing narrowing.pyi fixture for the test cases. A search of the *.test files shows it was only used in the narrowing tests, so there shouldn't be any speed impact in other areas.

Jordandev678 and others added 4 commits June 7, 2024 19:32
Enables the narrowing of variable types when checking a variable is "in"
a collection, and the collection type is a subtype of the variable type.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Jun 8, 2024

Diff from mypy_primer, showing the effect of this PR on open source code:

mitmproxy (https://github.com/mitmproxy/mitmproxy)
+ mitmproxy/addons/core.py:172: error: Unused "type: ignore" comment  [unused-ignore]

psycopg (https://github.com/psycopg/psycopg)
+ tests/types/test_array.py:212: error: Incompatible types in assignment (expression has type "type[int] | type[float]", variable has type "type[Decimal]")  [assignment]

steam.py (https://github.com/Gobot1234/steam.py)
- steam/ext/csgo/state.py:206: error: Argument 1 to "__delitem__" of "dict" has incompatible type "int"; expected "AssetID"  [arg-type]
- steam/ext/csgo/state.py:207: error: Argument 1 to "__delitem__" of "dict" has incompatible type "int"; expected "AssetID"  [arg-type]

schema_salad (https://github.com/common-workflow-language/schema_salad)
+ schema_salad/avro/schema.py: note: In function "make_avsc_object":
+ schema_salad/avro/schema.py:703:30: error: Redundant cast to "str"  [redundant-cast]
+ schema_salad/schema.py: note: In function "make_valid_avro":
+ schema_salad/schema.py:561:28: error: Redundant cast to "str"  [redundant-cast]

pandas (https://github.com/pandas-dev/pandas)
+ pandas/io/stata.py:1478: error: Redundant cast to "str"  [redundant-cast]
+ pandas/io/stata.py:1798: error: Redundant cast to "str"  [redundant-cast]
+ pandas/core/internals/concat.py:98: error: Argument 2 to "_is_homogeneous_mgr" has incompatible type "type[object]"; expected "dtype[Any] | ExtensionDtype"  [arg-type]
+ pandas/core/internals/concat.py:104: error: Argument 3 to "_concat_homogeneous_fastpath" has incompatible type "type[object]"; expected "dtype[Any]"  [arg-type]
+ pandas/tests/extension/test_masked.py:309: error: "str" has no attribute "name"  [attr-defined]

pycryptodome (https://github.com/Legrandin/pycryptodome)
+ lib/Crypto/SelfTest/Signature/test_dss.py:334: error: Incompatible types in assignment (expression has type "None", variable has type "str")  [assignment]

Tanjun (https://github.com/FasterSpeeding/Tanjun)
- tanjun/context/menu.py:141: error: Incompatible return value type (got "CommandType", expected "Literal[CommandType.USER, CommandType.MESSAGE]")  [return-value]
+ tanjun/commands/menu.py:534: error: Incompatible types in assignment (expression has type "CommandType", variable has type "Literal[CommandType.USER]")  [assignment]
+ tanjun/commands/menu.py:534: error: Incompatible types in assignment (expression has type "CommandType", variable has type "Literal[CommandType.MESSAGE]")  [assignment]

pytest (https://github.com/pytest-dev/pytest)
+ src/_pytest/python_api.py:320: error: "Never" not callable  [misc]

mongo-python-driver (https://github.com/mongodb/mongo-python-driver)
+ pymongo/synchronous/common.py:378: error: Incompatible return value type (got "str", expected "_ServerMode")  [return-value]
+ pymongo/asynchronous/common.py:378: error: Incompatible return value type (got "str", expected "_ServerMode")  [return-value]

jax (https://github.com/google/jax)
+ jax/_src/interpreters/partial_eval.py:2656: error: Unused "type: ignore" comment  [unused-ignore]

operator (https://github.com/canonical/operator)
- ops/lib/__init__.py:241: error: Argument 1 to "_Missing" has incompatible type "dict[str | Any, object]"; expected "bool"  [arg-type]
+ ops/lib/__init__.py:241: error: Argument 1 to "_Missing" has incompatible type "dict[str, object]"; expected "bool"  [arg-type]

spark (https://github.com/apache/spark)
+ python/pyspark/pandas/utils.py:746: error: Redundant cast to "int"  [redundant-cast]

pydantic (https://github.com/samuelcolvin/pydantic)
- pydantic/v1/generics.py:124: error: No overload variant of "create_model" matches argument types "str", "str", "tuple[Any, ...]", "None", "dict[str, classmethod[Any, Any, Any]]", "None", "dict[Any, tuple[DeferredType, FieldInfo]]"  [call-overload]
+ pydantic/v1/generics.py:124: error: No overload variant of "create_model" matches argument types "str", "str", "tuple[Any, ...]", "None", "dict[str, classmethod[Any, Any, Any]]", "None", "dict[str, tuple[DeferredType, FieldInfo]]"  [call-overload]

discord.py (https://github.com/Rapptz/discord.py)
+ discord/gateway.py:968: error: Argument 3 to "select_protocol" of "DiscordVoiceWebSocket" has incompatible type "Literal['xsalsa20_poly1305_lite', 'xsalsa20_poly1305_suffix', 'xsalsa20_poly1305']"; expected "int"  [arg-type]
+ discord/state.py:662: error: Unused "type: ignore" comment  [unused-ignore]

streamlit (https://github.com/streamlit/streamlit)
+ lib/streamlit/elements/heading.py: note: In member "_handle_divider_color" of class "HeadingMixin":
+ lib/streamlit/elements/heading.py:272:20: error: Redundant cast to "str"  [redundant-cast]

aiortc (https://github.com/aiortc/aiortc)
- src/aiortc/rtcpeerconnection.py:1143: error: Invalid index type "int | str" for "dict[int, RTCRtpDecodingParameters]"; expected type "int"  [index]

@Jordandev678
Copy link
Author

Jordandev678 commented Jun 8, 2024

I'm working my way through the primer results. What I've got so far if someone want's to comment:

The redundant cast warnings are expected behavior from this change as mypy will be able to pick up the type narrowing on it's own that previously they'll have had to manually check and then cast to make work. So the focus is on the others.

psycopg (https://github.com/psycopg/psycopg)
+ tests/types/test_array.py:212: error: Incompatible types in assignment (expression has type "type[int] | type[float]", variable has type "type[Decimal]")  [assignment]
@pytest.mark.parametrize("wrapper", "Int2 Int4 Int8 Float4 Float8 Decimal".split())
@pytest.mark.parametrize("fmt_in", PyFormat)
@pytest.mark.parametrize("fmt_out", pq.Format)
def test_list_number_wrapper(conn, wrapper, fmt_in, fmt_out):
    wrapper = getattr(psycopg.types.numeric, wrapper)
    if wrapper is Decimal:
        want_cls = Decimal
    else:
        assert wrapper.__mro__[1] in (int, float)
        want_cls = wrapper.__mro__[1]   #line 212

Here, want_cls has been picked up by mypy as Decimal but the "assert in" has allowed mypy to narrow wrapper.__mro__[1] to (int | float). I don't have a problem with that - the core issue seems to be that want_cls picks up as Decimal in the else when it hasn't been assigned to yet. But I don't believe that's related to this change?

steam.py (https://github.com/Gobot1234/steam.py)
- steam/ext/csgo/state.py:206: error: Argument 1 to "__delitem__" of "dict" has incompatible type "int"; expected "AssetID"  [arg-type]
- steam/ext/csgo/state.py:207: error: Argument 1 to "__delitem__" of "dict" has incompatible type "int"; expected "AssetID"  [arg-type]
                item_count = utils.get(gc_item.attribute, def_index=270)
                self.set("contained_item_count", READ_U32(item_count.value_bytes) if item_count is not None else 0)

            elif not isinstance(gc_item, CasketItem) and gc_item.id in self.casket_items:
                del self.casket_items[gc_item.id]   #line 206
                del self.waiting_for_casket_items[gc_item.id]

I'm happy with this.
gc_item.id is a AssetID defined as AssetID = _NewType("AssetID", int) therefore gets picked up as builtins.int.
The iterable type of casket_items is builtins.dict[steam.types.id.AssetID, steam.ext.csgo.backpack.CasketItem] making the collection item type steam.types.id.AssetID.
That makes the comparison is_subtype(steam.types.id.AssetID, int), which is true, so it narrows gc_item.id to steam.types.id.AssetID. That all makes sense.

pandas (https://github.com/pandas-dev/pandas)
+ pandas/core/internals/concat.py:98: error: Argument 2 to "_is_homogeneous_mgr" has incompatible type "type[object]"; expected "dtype[Any] | ExtensionDtype"  [arg-type]
+ pandas/core/internals/concat.py:104: error: Argument 3 to "_concat_homogeneous_fastpath" has incompatible type "type[object]"; expected "dtype[Any]"  [arg-type]
+ pandas/tests/extension/test_masked.py:309: error: "str" has no attribute "name"  [attr-defined]
	DtypeObj = Union[np.dtype, "ExtensionDtype"]
        ...
	def _is_homogeneous_mgr(mgr: BlockManager, first_dtype: DtypeObj) -> bool:
        ...
        first_dtype = mgrs_indexers[0][0].blocks[0].dtype
        if first_dtype in [np.float64, np.float32]:
            # TODO: support more dtypes here.  This will be simpler once
            #  JoinUnit.is_na behavior is deprecated.
            #  (update 2024-04-13 that deprecation has been enforced)
            if (
                all(_is_homogeneous_mgr(mgr, first_dtype) for mgr, _ in mgrs_indexers) #line 98
                and len(mgrs_indexers) > 1
            ):
                # Fastpath!
                # Length restriction is just to avoid having to worry about 'copy'
                shape = tuple(len(x) for x in axes)
                nb = _concat_homogeneous_fastpath(mgrs_indexers, shape, first_dtype)
                return BlockManager((nb,), axes)

Well, that hurt my head. first_dtype picks up as Any but [np.float64, np.float32] picks up as builtins.list[def (Union[None, Union[builtins.str, builtins.bytes], typing.SupportsFloat, typing.SupportsIndex] =) -> builtins.object]. As basically anything is a subtype of Any that narrows to builtins.object.
I don't know enough about numpy's type setup off-hand for it to be obvious how mypy wound up assigning builtins.list[def (Union[None, Union[builtins.str, builtins.bytes], typing.SupportsFloat, typing.SupportsIndex] =) -> builtins.object to [np.float64, np.float32]. But... assuming that's correct.. it looks fine and that it only wasn't an error before because first_dtype picked up as Any and was never narrowed.

But I'd like a second opinion.

pycryptodome (https://github.com/Legrandin/pycryptodome)
+ lib/Crypto/SelfTest/Signature/test_dss.py:334: error: Incompatible types in assignment (expression has type "None", variable has type "str")  [assignment]
        if hash_name in ("SHA512224", "SHA512256"):
            truncate = hash_name[-3:]
            hash_name = hash_name[:-3]
        else:
            truncate = None

That error looks sane to me, now mypy knows hash_name narrows to Literal["SHA512224", "SHA512256"] it's complaining truncate can be a string or None but isn't defined as Optional[str]. There's an argument it should intuit that, but that seems a separate discussion from the narrowing.

@Jordandev678
Copy link
Author

Tanjun (https://github.com/FasterSpeeding/Tanjun)
- tanjun/context/menu.py:141: error: Incompatible return value type (got "CommandType", expected "Literal[CommandType.USER, CommandType.MESSAGE]")  [return-value]
+ tanjun/commands/menu.py:534: error: Incompatible types in assignment (expression has type "CommandType", variable has type "Literal[CommandType.USER]")  [assignment]
+ tanjun/commands/menu.py:534: error: Incompatible types in assignment (expression has type "CommandType", variable has type "Literal[CommandType.MESSAGE]")  [assignment]
    _VALID_TYPES: frozenset[typing.Literal[hikari.CommandType.USER, hikari.CommandType.MESSAGE]] = frozenset(
        [hikari.CommandType.USER, hikari.CommandType.MESSAGE]
    )
    ...
    def type(self) -> typing.Literal[hikari.CommandType.USER, hikari.CommandType.MESSAGE]:
        # <<inherited docstring from tanjun.abc.MenuContext>>.
        command_type = hikari.CommandType(self._interaction.command_type)
        assert command_type in _VALID_TYPES
        return command_type #line 141

Removed error after successful narrowing, I'm happy with that.

        _VALID_TYPES = frozenset((hikari.CommandType.MESSAGE, hikari.CommandType.USER))
        _MenuTypeT = typing.TypeVar(
            "_MenuTypeT", typing.Literal[hikari.CommandType.USER], typing.Literal[hikari.CommandType.MESSAGE]
        )
        ...
        if type_ not in _VALID_TYPES:
            raise ValueError("Command type must be message or user")
        ...
        self._tracked_command: typing.Optional[hikari.ContextMenuCommand] = None
        self._type: _MenuTypeT = type_  # MyPy bug causes this to need an explicit annotation.
        self._wrapped_command = _wrapped_command #line 535

Added error. type_ is Any (which I assume is why they have the mypy bug comment). _VALID_TYPES gets picked up as uiltins.frozenset[hikari.commands.CommandType] which the in check then narrows type_ to hikari.commands.CommandType. The assignment then fails because CommandType from _VALID_TYPES isn't a subset of Literal[CommandType.X] from _MenuTypeT. I think the narrowing is doing the right thing here and it's a conflict of two similar but different types.

mongo-python-driver (https://github.com/mongodb/mongo-python-driver)

  • pymongo/synchronous/common.py:378: error: Incompatible return value type (got "str", expected "_ServerMode") [return-value]
  • pymongo/asynchronous/common.py:378: error: Incompatible return value type (got "str", expected "_ServerMode") [return-value]
_MONGOS_MODES = (
    "primary",
    "primaryPreferred",
    "secondary",
    "secondaryPreferred",
    "nearest",
)
...
def validate_read_preference_mode(dummy: Any, value: Any) -> _ServerMode:
    """Validate read preference mode for a MongoClient.

    .. versionchanged:: 3.5
       Returns the original ``value`` instead of the validated read preference
       mode.
    """
    if value not in _MONGOS_MODES:
        raise ValueError(f"{value} is not a valid read preference")
    return value

Another case of an error caused by the original type being 'Any'. _MONGOS_MODES is Tuple[str, str, str, str, str], Any narrows to builtins.str, error is because builtins.str != _ServerMode. Looks good to me.

operator (https://github.com/canonical/operator)
- ops/lib/__init__.py:241: error: Argument 1 to "_Missing" has incompatible type "dict[str | Any, object]"; expected "bool"  [arg-type]
+ ops/lib/__init__.py:241: error: Argument 1 to "_Missing" has incompatible type "dict[str, object]"; expected "bool"  [arg-type]

This was already an error. The narrowing has just let it narrow away | Any. I'm happy.

discord.py (https://github.com/Rapptz/discord.py)

  • discord/gateway.py:968: error: Argument 3 to "select_protocol" of "DiscordVoiceWebSocket" has incompatible type "Literal['xsalsa20_poly1305_lite', 'xsalsa20_poly1305_suffix', 'xsalsa20_poly1305']"; expected "int" [arg-type]
  • discord/state.py:662: error: Unused "type: ignore" comment [unused-ignore]
    SupportedModes = Literal['xsalsa20_poly1305_lite', 'xsalsa20_poly1305_suffix', 'xsalsa20_poly1305']
    ...
    supported_modes: Tuple[SupportedModes, ...] = (
        'xsalsa20_poly1305_lite',
        'xsalsa20_poly1305_suffix',
        'xsalsa20_poly1305',
    )
    ...
    async def select_protocol(self, ip: str, port: int, mode: int) -> None:
    ...
    async def initial_connection(self, data: Dict[str, Any]) -> None:
        modes = [mode for mode in data['modes'] if mode in self._connection.supported_modes]
        _log.debug('received supported encryption modes: %s', ', '.join(modes))
        mode = modes[0]
        await self.select_protocol(state.ip, state.port, mode)

Looks fine, mode is defined as an int in the function signature but mode narrows to Literal['xsalsa20_poly1305_lite', 'xsalsa20_poly1305_suffix', 'xsalsa20_poly1305']. Only got away with it before because it was Any due to data: Dict[str, Any].

pydantic (https://github.com/samuelcolvin/pydantic)
- pydantic/v1/generics.py:124: error: No overload variant of "create_model" matches argument types "str", "str", "tuple[Any, ...]", "None", "dict[str, classmethod[Any, Any, Any]]", "None", "dict[Any, tuple[DeferredType, FieldInfo]]"  [call-overload]
+ pydantic/v1/generics.py:124: error: No overload variant of "create_model" matches argument types "str", "str", "tuple[Any, ...]", "None", "dict[str, classmethod[Any, Any, Any]]", "None", "dict[str, tuple[DeferredType, FieldInfo]]"  [call-overload]

It was already an error, this just narrowed one of the types. Fine.

aiortc (https://github.com/aiortc/aiortc)
- src/aiortc/rtcpeerconnection.py:1143: error: Invalid index type "int | str" for "dict[int, RTCRtpDecodingParameters]"; expected type "int"  [index]

I didn't dig into the code for this but I assume the narrowing let it narrow away the | str as we've seen in some others thus removing the error.

I think that's them all, none of those have gave me cause for worry. Quite a few are just because the narrowing has shown up some usages of Any which let it ignore errors. But those errors have then tripped now once the Any was narrowed. But I think that's fine. By virtue of it being in a collection with X that doesn't overlap with Any we know it can't be Any anymore and must be X - we should check the following code with that in mind.

Probably a good time for someone to put a second pair of eyes on the PR see what they think.

@Jordandev678 Jordandev678 marked this pull request as ready for review June 9, 2024 13:06
@jakkdl
Copy link

jakkdl commented Jun 20, 2024

I could see a case for not narrowing Any, since that is often used when you want to silence errors because whatever you're doing is not possible to type check.

@Jordandev678
Copy link
Author

I could see a case for not narrowing Any, since that is often used when you want to silence errors because whatever you're doing is not possible to type check.

Thanks for taking a look. I hope this is a kind of best-of-both. If you check X is in (Y,Z) and it is, it narrows it as we do know it must be X | Z not Any. But if it isn't it'll leave it as Any which goes back to being ignored. This let's you have:

from typing import Any
def foobar(foo: Any):
    if foo in ['a', 'b']:
        reveal_type(foo)  # N: Revealed type is "builtins.str"
    else:
        reveal_type(foo)  # N: Revealed type is "Any"

That allows it to fit well with the gradual-typing approach by letting it get type checked in one case and still go forward ignoring errors in the other.

But I can see the argument for just ignoring it completely and leaving Any as a "always ignore". I don't have a particularly strong opinion either way.

@jakkdl
Copy link

jakkdl commented Jun 20, 2024

I could see a case for not narrowing Any, since that is often used when you want to silence errors because whatever you're doing is not possible to type check.

Thanks for taking a look. I hope this is a kind of best-of-both. If you check X is in (Y,Z) and it is, it narrows it as we do know it must be X | Z not Any. But if it isn't it'll leave it as Any which goes back to being ignored. This let's you have:

from typing import Any
def foobar(foo: Any):
    if foo in ['a', 'b']:
        reveal_type(foo)  # N: Revealed type is "builtins.str"
    else:
        reveal_type(foo)  # N: Revealed type is "Any"

That allows it to fit well with the gradual-typing approach by letting it get type checked in one case and still go forward ignoring errors in the other.

But I can see the argument for just ignoring it completely and leaving Any as a "always ignore". I don't have a particularly strong opinion either way.

At a glance it seems like 3 of the introduced errors in the primer are cases of narrowing away from Any, and I'd say these are the only ones that could be controversial. I don't have a terribly strong opinion on the matter myself either tbf, but the rest of the changes seem great.
Maybe @ilevkivskyi or @JukkaL have an opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Type not narrowed correctly for if X in collection
3 participants