Hacker News

Shape typing in Python

2024-04-1312:1213175jameshfisher.com

While I was looking the other way, Python got advanced static types! Here’s matrix multiplication, describing the input shapes and its output shape:

Show article

What does the dot do in JavaScript?

foo.bar, foo.bar(), or foo.bar = baz - what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01

Smear phishing: a new Android vulnerability

Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06

A probabilistic pub quiz for nerds

A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26

Time is running out to catch COVID-19

Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14

The inception bar: a new phishing method

A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27

The hacker hype cycle

I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23

Project C-43: the lost origins of asymmetric crypto

Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16

How Hacker News stays interesting

Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26

My parents are Flat-Earthers

For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20

The dots do matter: how to scam a Gmail user

Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07

The sorry state of OpenSSL usability

OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02

I hate telephones

I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08

The Three Ts of Time, Thought and Typing: measuring cost on the web

Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26

Granddad died today

Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19

How do I call a program in C, setting up standard pipes?

A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17

Your syntax highlighter is wrong

Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11

Read the original article

alexmolas

Karma: 5537

@Hacker__News
@hacker._news

Comments

By jmduke 2024-04-1314:3510 reply

Having migrated my application's Python and JS codebases to their typed siblings respectively last year, my 2c is that Python typing feels good and worthwhile when you're in the standard lib, but _awful_ (and net-negative) once you leave "normal Python" for the shores of third-party packages, particularly ones that lean heavily on duck typing (Django and BeautifulSoup both come to mind.)

This is where some of the stuff in the TypeScript ecosystem really shines, IMHO — being able to have a completely typesafe ORM such as Drizzle (https://orm.drizzle.team/) feels like a Rubicon moment, and touching anything else feels like a significant step backwards.

By networked 2024-04-1316:271 reply

My experience has been different: last year I started writing Python again after a long break, and I am yet to regret using types pervasively. If some library has no type definitions, I prefer to have my typed code interact with its untyped code. It is still better than having no types at all. You can sometimes get some useful type safety by annotating your functions with the untyped library's classes.

Since then, I have used established libraries like Beautiful Soup, Jinja, Pillow, platformdirs, psutil, python-dateutil, redis-py, and xmltodict with either official or third-party types. I remember their types being useful to varying degrees and not a problem. I have replaced Requests with the very similar but typed and optionally async HTTPX. My most objectionable experience with types in Python so far has been having to write

    root = cast(
        lxml.etree._Element,  # noqa: SLF001
        html5.parse(html, return_root=True),
    )

when I used types-lxml with https://github.com/kovidgoyal/html5-parser. In return I have been able to catch some bugs early and to "fearlessly"refactor code with few or no unit tests, only integration tests. The style I have arrived at is close to https://kobzol.github.io/rust/python/2023/05/20/writing-pyth....

Admittedly, I don't use Django. Maybe I won't like typed Django if I do. My choice of type checker is Pyright in non-strict mode. It seems to usually, though not always, catch more and more subtle type errors than mypy. I understand that for Django, mypy with a Django plugin is preferred.

By andenacitelli 2024-04-142:32

You can also use something like stubgen to generate function definition signatures for dependencies for mypy to validate, then make your own changes to those files with better types if you wish.

I don’t think it’s very scalable, and having the library itself or a stubs package come with types is the only “good”-feeling route, but you at least have a somewhat decent path to still getting it decent without any intervention on the library’s part. It may even be sufficient, if (like in most situations) you only use a few functions from a library (which may in turn call others, but you only care about the ones your code directly touches), and therefore only need to type those ones.

By vforgione 2024-04-1314:46

While I don’t disagree, there are a number of additional stub libraries you can install that provide typing for libraries that don’t already have them. I personally find [django-types](https://pypi.org/project/django-types/) to be really well constructed and useful.

By eloisius 2024-04-1314:526 reply

I agree. Prior to the introduction of types in Python, I thought I wanted it. Now I hate them. It feels like a bunch of rigmarole for no benefit. I don’t use an IDE, so code completion or whatever you get for it doesn’t apply to me. Even strongly typed languages like rust have ergonomics to help you avoid explicitly specifying types like let x = 1. You see extraneous code like x: int = 1 in Python now. Third party libs have bonkers types. This function signature is ridiculous:

    sqlalchemy.orm.relationship(argument: _RelationshipArgumentType[Any] | None = None, secondary: _RelationshipSecondaryArgument | None = None, *, uselist: bool | None = None, collection_class: Type[Collection[Any]] | Callable[[], Collection[Any]] | None = None, primaryjoin: _RelationshipJoinConditionArgument | None = None, secondaryjoin: _RelationshipJoinConditionArgument | None = None, back_populates: str | None = None, order_by: _ORMOrderByArgument = False, backref: ORMBackrefArgument | None = None, overlaps: str | None = None, post_update: bool = False, cascade: str = 'save-update, merge', viewonly: bool = False, init: _NoArg | bool = _NoArg.NO_ARG, repr: _NoArg | bool = _NoArg.NO_ARG, default: _NoArg | _T = _NoArg.NO_ARG, default_factory: _NoArg | Callable[[], _T] = _NoArg.NO_ARG, compare: _NoArg | bool = _NoArg.NO_ARG, kw_only: _NoArg | bool = _NoArg.NO_ARG, lazy: _LazyLoadArgumentType = 'select', passive_deletes: Literal['all'] | bool = False, passive_updates: bool = True, active_history: bool = False, enable_typechecks: bool = True, foreign_keys: _ORMColCollectionArgument | None = None, remote_side: _ORMColCollectionArgument | None = None, join_depth: int | None = None, comparator_factory: Type[RelationshipProperty.Comparator[Any]] | None = None, single_parent: bool = False, innerjoin: bool = False, distinct_target_key: bool | None = None, load_on_pending: bool = False, query_class: Type[Query[Any]] | None = None, info: _InfoType | None = None, omit_join: Literal[None, False] = None, sync_backref: bool | None = None, **kw: Any) → Relationship[Any]

https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#...

By lolinder 2024-04-1315:002 reply

> It feels like a bunch of rigmarole for no benefit. I don’t use an IDE, so code completion or whatever you get for it doesn’t apply to me.

Maybe try using an IDE? Without one any language's type system will feel more frustrating than it's worth, since you won't get inline error messages either.

> Even strongly typed languages like rust have ergonomics to help you avoid explicitly specifying types like let x = 1.

This is called type inference, and as far as I can tell this level of basic type inference is supported by the major python type checkers. If you're seeing people explicitly annotate types on local variables that's a cultural problem with people who are unaccustomed to using types.

As for that function signature, it would be bonkers with or without types. The types themselves look pretty straightforward, the problem is just that they formatted it all on one line and have a ridiculous number of keyword arguments.

By eloisius 2024-04-1315:136 reply

Of course I’ve used an IDE before. I still prefer Vim to an IDE. And I enjoy writing typed languages in Vim because the compiler catches mistakes.

I agree part of the problem is cultural. Maybe a bunch of Python coders are eager to use types, or maybe linters are pushing them to type every last variable because that is “right.” I don’t know.

I don’t hate typed languages at all. In fact I love writing Rust. Even C++ is tolerable from a type perspective. I don’t agree that _RelationshipJoinConditionArgument is a meaningful type. It feels like bolting a type system onto the language after the fact is weird and necessitates crazy types like that to make some linter happy, maybe to make VS Code users happy, at the expense of readability.

By btreecat 2024-04-1315:392 reply

Vim is an IDE, with more steps. Nothing stopping you from having code completion setup in vim and benefiting from the additional meta info.

By int_19h 2024-04-145:47

https://github.com/puremourning/vimspector will also give you a Python debugger with breakpoints etc (same one as in VSCode, in fact).

By gopher_space 2024-04-1322:49

Neovim is pretty slick.

By robertlagrant 2024-04-1315:28

You can run mypy as the Python equivalent of the typey bit of a compiler.

As for SQLAlchemy, I wouldn't assume that the object model would be particularly different in any other OO language for the problem it's solving.

By joshuamorton 2024-04-1319:11

> _RelationshipJoinConditionArgument

Is it particularly different from Rust's unusual types like `Map<Chain<FromRef<Box dyn Vec<Foo>>>>>` that you can get when doing chained operations on iterators?

Protocol/Trait based typing necessitates weird names for in-practice traits/protocols that are used.

Edit: IDK why that function signature is that ridiculous, reformatting it as:

    sqlalchemy.orm.relationship(
        argument: _RelationshipArgumentType[Any] | None = None, 
        secondary: _RelationshipSecondaryArgument | None = None, 
        *, 
        uselist: bool | None = None, 
        collection_class: Type[Collection[Any]] | Callable[[], Collection[Any]] | None = None, 
        primaryjoin: _RelationshipJoinConditionArgument | None = None, 
        secondaryjoin: _RelationshipJoinConditionArgument | None = None, 
        back_populates: str | None = None, 
        order_by: _ORMOrderByArgument = False, 
        backref: ORMBackrefArgument | None = None, 
        overlaps: str | None = None, 
        post_update: bool = False, 
        cascade: str = 'save-update, merge', 
        viewonly: bool = False, 
        init: _NoArg | bool = _NoArg.NO_ARG, 
        repr: _NoArg | bool = _NoArg.NO_ARG, 
        default: _NoArg | _T = _NoArg.NO_ARG, 
        default_factory: _NoArg | Callable[[], _T] = _NoArg.NO_ARG, 
        compare: _NoArg | bool = _NoArg.NO_ARG, 
        kw_only: _NoArg | bool = _NoArg.NO_ARG, 
        lazy: _LazyLoadArgumentType = 'select', 
        passive_deletes: Literal['all'] | bool = False, 
        passive_updates: bool = True, 
        active_history: bool = False, 
        enable_typechecks: bool = True,
        foreign_keys: _ORMColCollectionArgument | None = None, 
        remote_side: _ORMColCollectionArgument | None = None, 
        join_depth: int | None = None, 
        comparator_factory: Type[RelationshipProperty.Comparator[Any]] | None = None, 
        single_parent: bool = False,
        innerjoin: bool = False, 
        distinct_target_key: bool | None = None, 
        load_on_pending: bool = False,
        query_class: Type[Query[Any]] | None = None, 
        info: _InfoType | None = None,
        omit_join: Literal[None, False] = None, 
        sync_backref: bool | None = None, 
        **kw: Any
    ) → Relationship[Any]

It's 2-3 expected arguments and then ~30 options (that are all like Optional[bool] or Optional[str] to customize the relationship factory. Types like `_ORMColCollectionArgument` do stick out, but they're mainly there because these functions accept `Union[str, ResolvedORMType]` and will convert some sql string to a resolved type for you, and like, this is an ORM, there are going to be some weird ORM types.

By eternityforest 2024-04-159:05

It definitely seems to be a target audience issue. I use VS code and MyPy regularly catches mistakes, some of which would have been fairly subtle.

I have MyPy and Ruff going all the time and generally aim for zero linter errors.

By bas 2024-04-1315:26

Thank you, for putting many of my frustrations to words.

By codethief 2024-04-1319:28

How does Vim prevent you from using a language server?

By whyever 2024-04-1317:41

> Without one any language's type system will feel more frustrating than it's worth, since you won't get inline error messages either.

I disagree, for me the integration with the editor mostly shortens feedback cycles, and enables some more advanced features. The utility of identifying problems without running the code is still there.

By mjr00 2024-04-1316:042 reply

> I don’t use an IDE, so code completion or whatever you get for it doesn’t apply to me.

This is a reasonable take if you're a solo developer working without an IDE. Though I suspect you'd still find a few missing None checks with type checking.

If you're working on a team, though, the idea is to put type-checking into your build server, alongside your tests, linting, and whatnot.

> You see extraneous code like x: int = 1 in Python now.

This shouldn't be necessary in most cases; Python type checkers are fine with inferring types.

> Third party libs have bonkers types. This function signature is ridiculous:

It is. Part of that is that core infrastructure libraries tend to have wonky signatures just by their nature. A bigger part, though, is that a lot of APIs in popular Python libraries are poorly designed, in that they're extremely permissive (like pandas APIs allowing dataframes, ndarrays, list of dicts, and whatever else) and use kwargs inappropriately. Type declarations just bring that to the surface.

By geysersam 2024-04-1322:352 reply

What's wrong with being extremely permissive? I'd argue that's a strength of the python ecosystem. It's true that very dense api:s are difficult to type, but I wouldn't say they're typically poorly designed because of it.

By relaxing 2024-04-1413:05

When you’ve got to pass in something that isn’t permitted and the list of things that is permitted isn’t documented you’ve got to dig 12 levels down into the library across 9 branching paths to figure out what input it actually does support.

By eternityforest 2024-04-159:12

Permissive libraries violate the "one and only one obvious way" philosophy.

I suspect they probably confuse AI tools more than restrictive APIs too, and give non-AI auto complete less to go on.

By nerdponx 2024-04-1320:38

Even without an IDE, I use Mypy like a test suite. It catches real bugs that would be either hard to find in testing, or intrusive and annoying to test for.

By yazzku 2024-04-1318:171 reply

That signature is ridiculous in any programming language. Types aren't the problem here.

By nielsbot 2024-04-1419:39

Was about to say the same thing... that method takes like 20 arguments. Types are the only thing making it usable.

By yrro 2024-04-259:04

I wouldn't mind all of that if the SQLAlchemy documentation would hide all the types until I mouse over them.

Ditto for vim!

By virtualritz 2024-04-1417:33

> [...] rust have ergonomics [...]

For starters Rust's official linter, clippy, would also tell you that this function has too many arguments. ;) The default (max) is seven[1].

The above function has 36 named arguments ... That is a code UX wtf with or without type annotations.

[1] https://rust-lang.github.io/rust-clippy/master/index.html#/t...

By ghnws 2024-04-1418:45

I've literally never seen anyone put types on trivial variables like that. Maybe your team is just inexperienced with types and/or python?

By traverseda 2024-04-1315:341 reply

> being able to have a completely typesafe ORM such as Drizzle (https://orm.drizzle.team/) feels like a Rubicon moment, and touching anything else feels like a significant step backwards.

Alright, but there's nothing stopping you from having a completely typesafe ORM in python, is there?

Sure, there's isn't really one that everyone uses yet, but the python community tends to be a bit more cautious and slower to adopt big changes like that.

By jmduke 2024-04-1316:491 reply

I'm talking about practical limitations, not academic ones. You're not incorrect (and libraries like FastAPI and Pydantic make me confident that the benefits of type-safety will grow throughout the ecosystem) but I am talking about from the perspective of someone considering whether or not to adopt typing within their Python project today.

By ghnws 2024-04-1418:46

What harm do you think typing a function, for example, would do? I'm genuinely qurious because I just can't see where the issue is.

By lemagedurage 2024-04-1314:42

That's fair. Though there's a wave of newer Python packages with great typing support. E.g. packages from https://github.com/encode

By jddj 2024-04-1314:481 reply

If I remember correctly, Typescript felt the same way for quite a long time

By nurettin 2024-04-1314:553 reply

It did, especially in the late 2013 and early 2014s. But then the type repositories quickly caught up. Python package authors usually shy away from such endeavours, especially those who use kwargs in order to configure large classes. pygann comes to mind.

By maleldil 2024-04-1316:22

Python has also seen a sizable movement towards using types as part of the design, such as Typer, FastAPI and Pydantic. Existing mainstream libraries are also slowly adopting types, such as pandas and numpy (including `numpy.typing`).

For the latter cases, it's not easy because typed APIs require different principles than dynamic/duck-typed ones. Still, I think it's safe to say that the community is trending towards more typing over time, especially greenfield projects. Personally, all my new projects are 100% typed, with type-safe wrappers around untyped libraries.

For what it's worth, since Python 3.12 (or with typing_extensions for earlier versions), it's also possible to use Unpack and TypedDict to type kwargs.

By int_19h 2024-04-146:03

Python was always much more dynamically typed than JS, and it also became the prevailing approach in the ecosystem.

By nerdponx 2024-04-1320:391 reply

What is pygann? It's not on PyPI.

By nurettin 2024-04-1417:41

it was pygad

By eternityforest 2024-04-158:58

Using types properly is always annoying. Seeing MyPy report no problems makes it worth it.

I find myself doing a lot of isinstance() and raise TypeError, but that's still a huge win, protecting everything after I've asserted the duck type is what it should be.

I also use beartype for runtime protection.

Typescript is pretty amazing though. I really like how integrated the ecosystem is.

By DanielVZ 2024-04-1315:16

I think you are missing stub libraries. After installing them its been a breeze

By amelius 2024-04-1314:481 reply

Maybe we can use LLMs to automatically bring these third party libs up to par?

Could be a nice showcase project for Copilot.

By 12_throw_away 2024-04-1319:181 reply

> Maybe we can use LLMs to automatically bring these third party libs up to par?

So, I actually tried this. I tried to use copilot to help generate type stubs for a third party library, hoping to be pleasantly surprised.

Copilot generated reasonable-looking type stubs that were not close enough to correct to be of any value. Even with the full source code in context, it failed to "reason" correctly about any of the hard stuff (unions, generics, overloads, variadics, quasi-structured mappings, weird internal proxy types, state-dependent responses, etc. etc.).

In my experience, bolting types onto a duck-typed API always produces somewhat kludgy results that won't be as nice as a system designed around static typing. So _of course_ an LLM can't solve that problem any more than adding type stubs can.

But really, the answer to "will LLMs fix $hard_problem for us?" is almost always "no", because $hard_problem can rarely be solved by just writing some code.

By golergka 2024-04-141:291 reply

What about GPT-4? I think that coupled with a decent RAG and agent systems, it would do a good job.

By 12_throw_away 2024-04-144:55

Ok, how does one set up "GPT-4 coupled with a decent RAG and agent systems"?

By technocratius 2024-04-144:101 reply

Which language do you refer to when you speak of python's typed sibling?

By js2 2024-04-1417:07

I take jmduke to mean Python with type annotations.

By naddeo 2024-04-1315:482 reply

I was surprised to see the example in the blog. Python actually has come pretty far with types but the blog's example doesn't really highlight it. For structural static typing, something like this is nicer as an example

    from typing import Protocol, Tuple, TypedDict


    class Foo(TypedDict):
        foo: str
        bar: int
        baz: Tuple[str, int]
        baaz: Tuple[float, ...]


    class Functionality(Protocol):
        def do(self, it: Foo): ...


    class MyFunctionality: # not explicitly implemented
        def do(self, it: Foo): ...


    class DoIt:
        def execute(self, it: Foo, func: Functionality): ...


    doit = DoIt().execute({ # Type checks
        "foo": "foo",
        "bar": 7, 
        "baz": ("str", 2), 
        "baaz": (1.0, 2.0)}, MyFunctionality())

Protocols and TypedDicts let you do nice structural stuff, similar typescript (though not as feature complete). Types are good enough on python that I would never consider a project without them, and I work with pandas and numpy a lot. You change your workflow a little bit so that you end up quarantining the code that interfaces with third-party libraries that don't have good type support behind your own functions that do. There are other pretty cool type things as well. Python is definitely in much better shape than it was.

Combine all of that with Pyright's ability to do more advanced type inference and its like a whole new language experience.

By bormaj 2024-04-1316:002 reply

I would supplement this by suggesting `pydantic` models instead of `TypedDict`s. This library has become a core utility for me as it greatly improves the developer experience with typing/validation/serialization support.

By maleldil 2024-04-1316:251 reply

They fulfil different roles. pydantic models would be an alternative to dataclasses, attrs or data-centric classes in general. TypedDict is used when you're stuck with dicts and can't convert them to a specific class.

By catlover76 2024-04-1318:54

[dead]

By naddeo 2024-04-1316:55

Yeah they're kind of different. I'm only really talking about TypedDicts because the original post was related to structural typing, which isn't what pydantic does. I do reach for pydantic first myself.

By oivey 2024-04-1319:483 reply

TypedDicts are a really disappointing feature. Typing fails if you pass it a dictionary with extra keys, so you can’t use it for many structural typing use cases.

It’s even more disappointing because this isn’t just an oversight. The authors have deliberately made having additional keys an error. Apparently, this even a divergence from how TypeScript checks dictionaries.

By Syntaf 2024-04-141:471 reply

For what it's worth, TypedDict was a bit ahead of it's own time. Python 3.12 is really the turning point for being able to leverage them effectively for stuff like **kwargs[1][2]

    from typing import TypedDict, Unpack, NotRequired

    class Movie(TypedDict):
        name: str
        year: NotRequired[int]

    def foo(**kwargs: Unpack[Movie]) -> None: ...

[1] https://typing.readthedocs.io/en/latest/spec/callables.html#... [2] https://peps.python.org/pep-0692/

By o11c 2024-04-142:06

How does that work for partial forwarding? Back when I last checked there was no imagining a DRY solution for that even for linear cases, let alone diamond inheritance.

  # Simple case, any halfway-decent type system should be able to handle this.
  def inner(*, i): pass
  def middle(*, m, **kwargs): inner(**kwargs)
  def outer(*, o, **kwargs): middle(**kwargs)
  kwargs = ...; outer(**kwargs)

  # More complicated case, but fairly common in Python code.
  class A:
    def __init__(self, *, a):
      pass
  class B(A):
    def __init__(self, *, b, **kwargs):
      super().__init__(**kwargs)
  class C(A):
    def __init__(self, *, c, **kwargs):
      super().__init__(**kwargs)
  class D(B, C):
    def __init__(self, *, d, **kwargs):
      super().__init__(**kwargs)
  kwargs = ...; D(**kwargs)

  # An even more complicated case involves `kwargs.pop()`, forwarding without `**`.

Can the above be typechecked yet?

By o11c 2024-04-141:492 reply

To be fair, that behavior on typescript's part is a major hole for bugs to slip through.

Specifically: absent optional keys + extra keys is fundamentally indistinguishable from miseptl keys.

By chuckadams 2024-04-1417:40

`interface` in TS allows extra keys, `type` does do not. Usually best to use `type` and add an intersection with some other type if you want extras (`Record<string,unknown>` is all right for arbitrary extra keys)

By oivey 2024-04-143:17

The current situation is worse, IMO. Lots of code that could be checked cannot, and lots of code that could have a more clearly defined interface does not.

By arwineap 2024-04-1414:271 reply

I'm not sure I understand.

TypedDicts are disappointing because you can't partially define a type? That seems like a success

In go you would need to define all fields in your struct and if you needed unstructured data you would have to define a map, which even then is partially typed

How should extra keys behave in "typed python?"

By oivey 2024-04-1416:241 reply

Go is a terrible point of comparison. Python’s type system should not pointlessly match what other languages have chosen due to some myopic dogma.

Python is a dynamic language where one of its major features is structural subtyping, aka duck typing. It’s effectively an alternative to inheritance. Features have been added to help support this, like TypedDicts and Protocols already. They don’t go far enough.

I want to be able to say “this function argument is a dictionary that has at least the keys a, b, and c.” It gives the contract that the function only accesses those keys, and others will be ignored. The type checker can check that the function doesn’t access undeclared keys and the annotation helps communicate to client code what the interface is supposed to be.

Lots of Python’s bolted on type checking seems to be straight jacketed to match what’s in C, Java, Go, etc. Those languages don’t contain the only possible type systems or static checkers. There’s a serious lack of imagination. Python’s type system should be designed around enabling and making safer what the language is already good at.

By arwineap 2024-04-1514:00

> I want to be able to say “this function argument is a dictionary that has at least the keys a, b, and c.” It gives the contract that the function only accesses those keys, and others will be ignored.

Do we agree that this is the behavior of regular dicts in python? How should TypedDicts be different?

Surely, the goal of typing in python should not be to match behavior without

By lmeyerov 2024-04-1316:004 reply

Where I really want this is pandas. The community has been smoothing the basic typing story over the last couple of years, which helps with deprecations & basic API misuses. However, I'm excited for shape/dependent typing over dataframe column names, as that would get more into our typical case of data & logic errors.

By d3m0t3p 2024-04-1317:373 reply

You might want to check pola.rs then, it's backed by the appache arrow memory models and it's written in rust. All the columns have a defined type and you can easily catch a mistake when loading data

By lmeyerov 2024-04-140:54

Unless I'm misunderstanding, Arrow solves the data representation on disk/memory, both for pandas and polars, while I'm writing about type inferencing during static analysis, which Arrow doesn't solve.

Having a type checking system respect arrow schemas is indeed our ideal. Will polars during mypy static type checking invocations catch something like `df.this_col_is_missing` as an error? If so, that's what we want, that's great!

FWIW, we donated some of the first versions of what became apache arrow ;-)

By benrutter 2024-04-1318:591 reply

I've been hunting down column level typing for a while and did not realise polars had this! That's an absolute game changer, especially if it could cover things like nullability, uniqueness etc.

By bobbylarrybobby 2024-04-143:59

It's not static, it's basically the same as pandas. Your editor will not know the type of a given column or whether it even exists; all of that happens at runtime.

By ledauphin 2024-04-1319:18

do you have a reference for how to use static typing for polars columns? I haven't seen this in their docs...

By thenobsta 2024-04-1316:591 reply

Pandera helps with some of this. Check it out -- https://pandera.readthedocs.io/en/stable/

We've used it to great effect.

By lmeyerov 2024-04-141:01

This is neat, I like the direction!

As far as I can tell, it's runtime, not static, so it won't help during our mypy static checks period?

As intuited by the poster above, we already do generally stick to Apache Arrow column types for data we want to control. Anything we do there is already checked dynamically, such as at file loads and network IO (essentially contracts), and Arrow IO conversions generally already do checks at those points. I guess this is a lightweight way to add stronger dynamically-checked contracts at intermediate function points?

By ninja3925 2024-04-1316:231 reply

Column misnaming/typo is indeed a problem in pandas. I think a powerful IDE could do the trick though.

By lmeyerov 2024-04-141:04

Sort of... the IDE would want the mypy (or otherwise) typings to surface that. Internally, the dataframe library should make it easier for the IDE to see that, vs today's norm of tracking just "Any" / "Index" / "Series" / ... .

By sampo 2024-04-1317:081 reply

I wish timezone-naive and timezone-aware Timestamps would be different types.

By Hasnep 2024-04-143:35

See this post for a comparison of Python datetime libraries. The datetype, heiclockter and whenever libraries have different types for them.

https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls...