And please, just don't use * imports. It really doesn't save you much time at the cost of implicit untraceable behavior. If you don't worry about * imports, you don't need to add the __all__ boilerplate to every module.
This article is more about advertising a package called tach, that I suppose tries to add "true" private classes to Python.
But it doesn't actually enforce anything, because you still need to run their tool that checks anything. You could just easily configure your standard linter to enforce this type of thing rather than use a super specialized tool.
A direct benefit of using `__all__` at module A is better intellisense while editing files that imports A, if A has a small intended public API and many internal usage symbols.
I just tried it and at least the autocomplete in IPython appears to ignore __all__ when suggesting possible imports. I haven't tried any other tools' autocompletes.
If module A has a small intended public API, you can structure it no matter how you want to achieve that. You can put those internal symbols behind their own object/class/module if you prefer.
Using `__all__` has one functional consequence, which is `from A import *`. Again, I would avoid * imports entirely, but if you want to try to curb possible downstream problems from users who do indeed use * imports, I would also prefer not defining `__all__` because it's extra boilerplate you have to maintain and can very easily be missed on future updates.
Python's imports are the worst I've seen in any mainstream programming language by far.
Relative path imports are unnecessarily difficult, especially if you want to import something from a parent directory. There's no explicit way to define what you'd like to export either.
> Relative path imports are unnecessarily difficult, especially if you want to import something from a parent directory.
This is never explained well to beginners but this is because Python imports deal with modules and packages, any relationship with directories is accidental. With namespace packages foo.bar and foo.baz might live on totally separate parts of the filesystem. They might not live on the filesystem at all and instead be in a zip file.
somedir/
otherdir/
__init__.py
foo.py
scripts/
__init__.py
myscript.py
# in myscript.py
from .. import otherdir.foo
If you `python scripts/myscript.py` then it won't work because myscript.py is run with a __name__ of __main__ which doesn't have any parent module.
When you run python -m scripts.myscript it sets __package__ to scripts.myscript so now it does have a parent module and somedir and otherdir are importable under the root module because '.' is added to the search path by default.
In that case it only means: execute the code from that module, I don't care about importing stuff. If they set a global variable or interact with other modules I might use it.
As a Python dev, reading the "import * as bindingName from 'module-url'" seems confusing as it sounds like it'd try to assign multiple things to one name.
In Python, the following line pairs are roughly equivalent:
import X
X = __import__("X")
or
import X as Y
Y = __import__("X")
I understand and kinda appreciate the nice (apparent) simplicity of a dedicated "import", especially as you will use it a ton, but I also like the bluntness of Zig where it's not special (well, it has a macro), and you're explicitly doing the assignment:
Really not a fan of languages that allow importing (using, including...) things where magic happens and a bunch of names are just available without any clear idea where from. Yeah, an IDE will help you find the source of a C function from an #include, but if you're trying to debug some 3rd party library and don't want to download the whole thing, the ability to hand-trace it a bit is just gone. Why mix a bunch of names into the local namespace? If you want short names, just assign it to a single letter variable.
Circling back to Python, the only practical place I see 'import * from ...` being used is in libraries where they are bundling up names from a bunch of submodules to be accessible at a top-level. Though, I was searching for an example of this as I know NumPy did it until they undid it: https://github.com/numpy/numpy/pull/24357
> I understand and kinda appreciate the nice (apparent) simplicity of a dedicated "import", especially as you will use it a ton, but I also like the bluntness of Zig where it's not special (well, it has a macro), and you're explicitly doing the assignment:
That is exactly how CommonJS worked for the longest time:
const module = require('url')
The short end of the stick was the export syntax for CJS, which is basically just assigning a property to an object:
module.exports.value = 'foo'
The problem solve with the import syntax is that is statically parseable, you just need to parse the top-level of a package to find its dependencies. The explicit export is also a god-send for tooling, which can have a lot of introspection from it, without executing any code.
Relative imports are fuzzy if you don't dig deep, that said with a project scaffolded for you, it's rarely a headache (like testing in the old days), you can rapidly try various amount of dots.
I still haven't bothered to learn why relative imports require you to be in a package. It's a major headache if, like me, you do a lot of one-off work that doesn't warrant that project scaffolding.
Well, not a major headache. I can always revert to the py2 way: symlinks.
> It's a major headache if, like me, you do a lot of one-off work that doesn't warrant that project scaffolding.
Instead of creating "myproject.py", create a myproject folder with a __main__.py file. Run it as "python -m myproject".
Hardly any scaffolding and your code is in a package. It's one of the things I like about python, it's quite easy to move "up the ladder" scaffolding-wise if you need to.
The worst thing is that it can't handle circular imports, unlike every other language, AFAIK.
I've been told it's because of some deep part of Python interpretation that just can't be changed. The mutable default argument madness is also caused by that.
My guess is that imports in python are just executing python. When you import a module you are (optionally) running code in that module. So a circular import generates infinite recursion.
Python does a lot when it evaluates an import statement. That is where a lot of the python magic happens. As soon as you try to limit the import statement somehow, a lot of python code needs to be re-written (and maybe doesn't work at all anymore). That's arguably a bad decision but it's one at the center of the language and it's unlikely to change.
You're right. And same with the mutable default argument "trap". That's cause by `def foo` being a statement, not syntax. When a module is imported, the code in it is executed. Many of the statements in that will be `def something`, which, when executed, defines a function. And because that's code that gets executed,
def foo(bar=[]):
bar.append('lol')
return bar
the `bar=[]` gets executed at that time. That is, Python doesn't treat `def foo` as some magic thing that gets special cased and squirreled away for later use.
Exactly, importing (like everything else in Python) is just a "syntax sugar" around loading the code on a file and running it instantly. If you do a circular reference it can pop the stack
> importing (like everything else in Python) is just a "syntax sugar" around loading the code on a file and running it instantly.
Not quite - it also checks whether the file has already been imported, and does nothing if so.
And it also checks if the file has been "partially imported" which is what causes it to fail during circular imports.
You could imagine a small change to Python which made the second case a no-op as well. This would allow you to use circular imports. You'd need to restrict yourself to imports like "import mymodule; mymodule.myfunction()" rather than "from mymodule import myfunction; myfunction()" but that's encouraged in popular style guides [0] anyway. By the time you run the function, mymodule.myfunction will be bound correctly and everything will work.
This would create hard-to-debug situations for people who rely on executing lots of side-effect code to set up state at import time, but 1) that would only happen in code that currently can't run at all, and 2) those people deserve it.
I think it would even be reasonable to make the "from mymodule" version work by binding those names later, but it wouldn't be a one-line change to the interpreter and it might break some legitimate use cases where a name got redefined.
IMO the way to think about it is that the turing-complete nature of python imports is one of the central design decisions of the language. Lots of python oddities make more sense once you realize that the module structure is determined at runtime - including the enormous capacity for the language to do its own setup (or fail to do its own setup in amazingly complex ways).
So it's not a bug, it's one of the central choices that makes python "pythonic."
Python is just plain unsuitable for any project larger than a couple of files.
Every Python project contains a hidden and deadly complexity that will grow over time - and will eventually destroy it. There's no way around it, it creeps in no matter what you do. The imports situation is only part of it - it wasn't what killed our simulation tools, or our build scripts, or our test framework - and required that we rewrite them all in different and more suitable languages.
Python's performance, global modules, whitespace, untyped-by-default code are all killers. You pretty much have to use virtual environments to permit isolation between the multiple differently-versioned sets of dependent packages that you'll need for any project of any complexity, which are a cumbersome and painful solution to a problem that simply shouldn't exist.
It may technically be possible to write clean and maintainable code in Python if you try hard enough, but you're always skating so close to the edge that eventually somebody is going to get in there tip the whole fragile mess into the abyss.
I’ve led the maintenance and development of a medium sized (100s of kloc) Python codebase over a period of five years. As it matured, the code became significantly less fragile and more maintainable. Using private members of modules (or any other object) turned into way less of a problem as we improved our design. Turns out the problems were mostly around I/O and mutable state. We fixed those problems by pushing side effects and mutation as far out to the edge of the codebase as possible. If your core data processing routines don’t have side effects and don’t mutate global state, who cares who the caller is?
I don't think such a thing exists or should exist. Languages should be different and those differences will naturally be better at a variety of problems. But also it's definitely C#.
Don't know about using __all__ for introspection, but I have found it immensely useful for organizing, reading, and communicating code. When a package has a bunch of files inside of it, but only a handful of names exposed in __all__ it helps a lot with orienting yourself around the package.
It would be interesting to compare this with an alternative based on static analysis.
The Python ecosystem has many standard tools nowadays to enforce consistent style, including how modules import each other. The ast and libcst modules are very fast and can quickly identify any imported symbols beginning with an underscore:
from a import _naughty
It’s also quite possible to build a list of symbols that were imported and ensure that their underscore-prefixed attributes are not accessed:
import a
a._naughty()
You could get creative I suppose…
import a
f = next(
getattr(a, f“_{s}”)
for s in synonyms(“cheeky”)
)
f()
…but at that point one hopes that, as a last resort, ones reviewers would cry foul.
Yup that's what I do too. Works great. And nbdev auto-generates __all__ so I don't have to think about anything -- it all just works.
(Besides which, when you do have __all__ just regular wildcard imports work well anyway. I've been using them for >10 years without trouble. I think people just repeat the claim that they're a problem without having 1st-hand experience of using them with __all__ correctly.)
Eh. Python does indicate public and private APIs. "Name" is public. "_Name" is internal, but you can mess with it if you need to. "__Name" is private, and if you go there anyway and the thing explodes and gives you a bad haircut, you were warned.
Python's position is that we're all adults here. Don't touch "_Name". Really don't touch "__Name". You can if you're an expert and willing to take responsibility for your actions.
Addendum: And in this ModuleWrapper thing, I can still access `core._module.PrivateApi` so we're back to square one.
In my experience, Python devs rarely use the underscore.
Package authors are pretty good about it. But when the only people using your code are your coworkers, people don't seem to put as much thought into having a clean interface.
The notion that developers are not to be treated as adults and must have access forbidden to them, or else tight coupling will develop and rot the code, is super strange to me.
If you cannot trust your coworkers to respect the _meaning of __names, then how can you trust them with much harder concerns like algorithms and data-structures?
The article contains this amazing quote:
"""I’ve seen a unicorn startup dump their existing codebase and start over because the modules within it became so tightly coupled together, it was impossible to effectively develop within it or break them apart."""
This quote definitely needs flesh on its bones. I don't think this was Instagram, and I am not sure who else? Could the author chime in maybe?
For me it's less about trusting coworkers, and more about trusting myself. When I see these sorts of discussions through those eyes, I can understand much more why people want guard rails when they're programming. I fancy myself a fairly competent dev, but I've seen code that I've written on a bad day where I can barely even understand what I was trying to achieve, let alone what I actually did achieve. Ideally, yes, there's code review and a second pair of eyes to make sure everything does make sense, and that my publics are public and privates are private. But sometimes a reviewer misses things as well, or the reviewer isn't as familiar with that bit of the codebase, or whatever other problem - and mistakes happen.
Obviously mistakes can also happen with guard rails enabled. In a language with explicit exports or private-by-default attributes, you can still make mistakes and expose more implementation details than you wanted. But in my experience, that happens a lot less often than it does in Python, where privacy is just a matter of convention that's easy to forget.
So in that regard, yes I trust myself and my coworkers, even with the difficult parts, but I also assume that we'll all occasionally make mistakes, and if I can find ways to avoid those mistakes, I'm all for it.
My experience primarily comes from scale-ups, where the companies grew fast and hired fast.
Without putting specific companies on blast, I hear about this problem consistently from growth stage startups that scaled on Python. It's a common reason people reach for microservices - trying to use network boundaries to cover for the lack of module boundaries.
Ideally you can trust coworkers; unfortunately that's not always the case. Here's a quote from a developer in response to the article in another forum:
"""
It can even be very useful sometimes to be able to import private stuff. In a large ecosystem where you use many modules from other teams, you might need to use some private functionality.
"""
If something is available for an individual developer that solves their problem, often they're not thinking about the fact that it's private or creates a brittle dependency with no contract.
The false assumption that a language can enforce something is the root of 10,000 CVEs.
For instance, in C, you have read/write access to the whole program's memory space. Want to work around some security mechanism? A well-aimed pointer will do the right trick.
Python doesn't prevent you from ever touching _internal or __private names. If you're reviewing a Python PR that does that, you should be as suspicious as if you were reviewing C code that stomps all over memory. In either case you almost certainly should not be doing that.
But sometimes, only sometimes, you might be the right person in the right use case to do it. Then you can ignore the linters screaming at you and concentrate on convincing your coworker this is a good idea.
And please, just don't use * imports. It really doesn't save you much time at the cost of implicit untraceable behavior. If you don't worry about * imports, you don't need to add the __all__ boilerplate to every module.
This article is more about advertising a package called tach, that I suppose tries to add "true" private classes to Python.
But it doesn't actually enforce anything, because you still need to run their tool that checks anything. You could just easily configure your standard linter to enforce this type of thing rather than use a super specialized tool.