cft

Objects—Objects Everywhere

This time, we cover objects—which includes everything, so we focus on the abstraction—IDs, types.


user

Dan Gittik

3 years ago | 13 min read

In Python, everything is an object. Well, almost everything—keywords like if are not; but everything else is: numbers, strings, functions, instances—even classes! Each object has a unique ID, a type that defines its behavior, and a value that parametrizes it—but contrary to popular belief, the most important (and interesting) part of this trinity is the type. In this article, we’ll understand why, and see some of the fascinating behaviors we can imprint unto custom objects.

But first thing first: let’s talk about object-oriented programming, or OOP. So far, we’ve dealt with procedural programming: in this paradigm, abstractions are encapsulated as procedures, or functions, which operate on data.

The biggest challenge here, then, is representing state; you have to use elaborate data structures, and pass them along into a myriad of functions, which construct new data structures or mutate existing ones.

In object-oriented programming, the paradigm is such that abstractions are encapsulated in objects: a combination of data, accessible through attributes, and code, represented by methods.

These methods automatically get a reference to their instance—traditionally called self—so that they can manage that instance’s state. In many cases, this is a more intuitive way to represent the world—and our way to wield it is through custom classes, which describe our object’s structure and behavior.

Everything is an Object

There’s a saying in creative writing: “show, don’t tell”. Well:

>>> 1
1
>>> (1).__class__
<class 'int'>>>> 'Hello, world!'
'Hello, world!'
>>> 'Hello, world!'.__class__
<class 'str'>>>> def f():
... pass
>>> f
<function f()>
>>> f.__class__
<class 'function'>>>> class A:
... pass
>>> a = A()
>>> a
<A object at 0x...>
>>> a.__class__
<class 'A'>>>> A
<class 'A'>
>>> A.__class__
<class 'type'>

As you can see, everything—from numbers to classes—can be represented as a string, even if a useless one like <A object at 0x...>; and everything has a class, accessible through the __class__ attribute, which points to its creator.

Seeing Some ID

Like I said, the first thing to address is the ID:

>>> a1 = A()
>>> a2 = A()>>> id(a1)
4586853096
>>> id(a2)
4586853122>>> a1 is a2
False
>>> a1 is a1
True

When an object is created, it’s given a unique identifier—in CPython, this is actually the memory address where its PyObject is allocated. That ID can be retrieved with the built-in id function, and compared against using the is keyword.

At first glance, is looks pretty stupid; isn’t it pretty much the same as equality? And if not, would an object really be anything but itself—so what’s the point in having a separate operator to test for it?

As it turns out, identity is different to equality: two objects representing the same file, or user, may be considered the same—but in actuality, be two separate entities, somewhere in memory. In other words, == compares two objects’ values, whereas is compares their identity. This is yet another way to test our previous discovery, that everything in Python is passed by reference:

>>> a1 is a2
False
>>> a3 = a1
>>> a1 is a3
True

As you can see, even though a1 and a2 represent pretty much the same (empty) object of type A, only a1—and every other name bound to the exact same object—is identical to itself. With objects that change, often called mutable objects, the benefit is clear: we have a way to identify a particular object, no matter its current value. With objects that don’t change, called immutable objects, it’s less obvious; so use == instead, unless you really know what you’re doing. For example:

>>> x1 = 1
>>> x2 = 1
>>> x1 is x2
True

Since integers are immutable, Python figures it might as well cache them; so when you refer to 1, which denotes the integer object whose value is 1, you always get the same instance—resulting in identity even with independent assignment statements. However:

>>> x1 = 1_000_000
>>> x2 = 1_000_000
>>> x1 is x2
False

Turns out, Python only caches commonly used integers—like the range from 0 to 127 or something. For big numbers, it allocates a new integer object with its unusual value every time, resulting in different objects that don’t test identical.

If this seems confusing, it’s because engaging in such exploratory research often reveals nuances and edge-cases that can be a bit overwhelming; so it’s important to step back and have another look at the bigger picture before moving forward. The bottom line is, every object in Python is assigned a unique identifier, accessible through id and testable through is, which lets us distinguish between objects even when their value is the same; moving on.

The Holy Trinity

Then we have the type. We saw that it’s accessible through the __class__ attribute, but a more elegant way to get it would be with the built-in type function:

>>> type(1)
<class 'int'>
>>> type('Hello, world!'>
<class 'str'>
>>> type(f)
<class 'function'>
>>> type(a)
<class 'A'>
>>> type(A)
<class 'type'>

This is just a pointer to some other object; and understanding the nature of this relation is what we’ll spend most of this chapter talking about.

But before we do—there’s also the value: some configuration, or state, which parametrizes the object’s behavior, much like arguments parametrize the execution of a function. Take 1, for example:

>>> 1
1

Its value is such, that when Python asks it for its representation, it returns the ASCII character 0x31, or 1. This is in fact encoded in its class’s __repr__ method:

>>> (1).__repr__()
'1'

Similarly, we can add two integers up, and their values determine the value of their sum:

>>> 1 + 1
2

Which is actually encoded in the __add__ method:

>>> (1).__add__(1)
2

It turns out, integers even have methods of their own, such as to_bytes, which serializes it into some number of bytes with some endianness:

>>> (1).to_bytes(4, 'little')
'\x01\x00\x00\x00'

This is actually pretty complex behavior: first, we have to resolve the integer’s attribute, which is handled by __getattribute__:

>>> (1).__getattribute__('to_bytes')
<built-in method to_bytes of int object at 0x…>

And then, we have to invoke that object, which is handled by __call__:

>>> (1).__getattribute__('to_bytes').__call__(4, 'little')
'\x01\x00\x00\x00'

So as you can see, while formally the holy trinity comprises of the ID, the type and the value—the really interesting bit is the type, which defines the behavior; the ID being more of a technicality, and the value being merely the argument to that behaviour’s function.

Mutability

One thing the type determines is mutability—whether an object can change or not. Some objects can’t, no matter how hard you want it:

>>> n = 1
>>> id(n)
4304947712
>>> n += 1
>>> id(n)
4304947744>>> s = 'Hello, world!'
>>> s[-1] = '.'
Traceback (most recent call last):
...
TypeError: 'str' object does not support item assignment>>> x = 1, 2, 3
>>> x[0] = 4
Traceback (most recent call last):
...
TypeError: 'tuple' object does not support item assignment

While others can:

>>> x = [1, 2, 3]
>>> x[0] = 4
>>> x
[4, 2, 3]>>> x = {'a': 1}
>>> x['b'] = 2
>>> x
{'a': 1, 'b': 2}

Unfortunately, even a concept as simple as this has its pitfalls; for example, if immutable objects can’t change, and tuples are immutable, then how come I can do this:

>>> x = [], 2, 3
>>> x
([], 2, 3)
>>> x[0].append(1)
>>> x
([1], 2, 3)

Booyah—the representation changed, so the value must be different. Except it’s not: the tuple’s value has only ever been three pointers, referencing an empty list and two integers; its representation behavior just so happened to be recursive, and delegate to those references. When we accessed the list and changed it, the tuple didn’t change: it’s still pointing to the same objects, and it always will. This can be further exemplified by this funny bug:

>>> x = [[]] * 5
>>> x
[[], [], [], [], []]
>>> x[0].append(1)
>>> x
[[1], [1], [1], [1], [1]]

At first sight, we create a list of 5 empty lists; in actuality, we create a list with one empty list, and then multiply it by 5, resulting of a list with five of those—that is, five references to the same empty list, which grow and shrink together, to everyone’s surprise and delight.

But anyway, the question of mutability just scratches the surface: there’s a whole lot more behaviors one can customize, if one was so inclined. And seeing as we are, let start this exciting journey—although truth be told, much like in the Lord of the Rings, our epic quest might seem a bit tedious at times. So grab a cookie, your favorite hot beverage, and a Mithril shirt:

Behavior Modification

Generally speaking, when you define a class, you do something like this:

class A:
def __behavior__(self):
# Do stuff

And then when you have an instance, a = A(), it behaves in this way under the appropriate circumstances. The simplest example of that is asking an object to display itself in some human-readable way:

>>> # Without:
>>> class User:
... def __init__(self, name):
... self.name = name
>>> user = User('Alice')
>>> print(user)
<User object at 0x...>>>> # With:
>>> class User:
... ... # Same as before
... def __str__(self):
... return self.name
>>> user = User('Alice')
>>> print(user)
Alice

This is effectively telling instances of A how to behave when being cast to string, which is what the print function secretly does to all its arguments. There’s another, more developer-oriented representation, which is what you get in when you just “dump” the object in the interpreter:

>>> print(user)
Alice
>>> # But...
>>> user
<User object at 0x...>

This behavior is controlled by __repr__, and intended to provide more technical information, so when the object is inspected in a debugging context, it’s easier to understand. If you can, it’s recommended to return the constructor invocation which yielded the object—or, if it isn’t very readable, or has a very volatile state, some other description between angle brackets:

>>> class User:
... def __init__(self, name):
... self.name = name
... def __repr__(self):
... return f'User({self.name!r})'
>>> user = User('Alice')
>>> user
User('Alice')

Notice the !r at the end of the name formatting? It’s a way to invoke the formatee’s __repr__ behavior (instead of the standard __str__), which is especially handy in recursive representations such as this. Without it, we’d get the syntactically incorrect User(Alice)—with it, we get the quotes, and Python is even smart enough to figure out how to escape trickier strings like "O'Brian".

But as we venture into the much more complex world of classes, already we start running into interesting caveats. Imagine I’d subclass User with a more specific Admin:

>>> class Admin(User):
... pass
>>> admin = Admin('Alice')
>>> admin
User('Alice')

That’s rather unfortunate—it seems like we’d have to override the __repr__ method for every subclass, even if they’re virtually identical. Or—

>>> class User:
... ... # Same as before
... def __repr__(self):
... return f'{self.__class__.__name__}({self.name!r})'

We can prevent the problem altogether by writing the original __repr__ in such a way, that the class’s name is resolved dynamically. You get this:

>>> user = User('Alice')
>>> user
User('Alice')
>>> admin = Admin('Alice')
>>> admin
Admin('Alice')

Pretty neat, no? But not at all obvious. Just wait until we talk about equality…

Equality

Let’s talk about equality. Imagine we have a class with a single attribute of a simple integer, which represents that object’s entire value and state. We’d want objects with a similar integer to test equal, right? Alas—

>>> class A:
... def __init__(self, x):
... self.x = x
>>> a1 = A(1)
>>> a2 = A(1)
>>> a1 == a2
False

That happens because we haven’t defined any custom comparison behavior, and Python defaults to comparing objects by their ID, which is always unique—so an object is only ever equal to itself. Unless…

>>> class A:
... ... # Same as before
... def __eq__(self, other):
... return self.x == other.x
>>> a1 = A(1)
>>> a2 = A(2)
>>> a1 == a2
True

That’s pretty nifty—but breaks rather quickly:

>>> a1 == 1
Traceback (most recent call last):
...
AttributeError: 'int' object has no attribute 'x'

What happened is that 1 was passed into __eq__, which tried to access its x attribute—but since it doesn’t have one, it caused an exception. We can argue whether, on some philosophical level, an object that’s wholly represented by some number should be equal to the number itself—but for our purposes, let’s decide A objects can only be equal to other A objects:

>>> class A:
... ... # Same as before
... def __eq__(self, other):
... return type(self) == A and self.x == other.x
>>> a1 = A(1)
>>> a2 = A(2)
>>> a1 == a2
True
>>> a1 == 1
False

However, this is easier said than done: what if we have a subclass again?

>>> class B(A):
... pass
>>> b1 = B(1)
>>> b2 = B(2)
>>> b1 == b2
False # Huh?
>>> b1 == b1
False # Huh???

This happens because we’ve hardcoded class A in the equality operator, and neither b1 nor b2 have a type of A. What we should’ve used is the more nuanced built-in function, isinstance:

>>> class A:
... ... # Same as before
... def __eq__(self, other):
... return isinstance(other, A) and self.x == other.x
>>> ... # Same as before
>>> b1 == b2
True

What happens is, isinstance is smart enough to traverse the entire class hierarchy, and it figures that indeed, the bs are instances (albeit, indirect descendants) of A—so it carries on with the rest of the test, which evaluates to True.

Inequality

To achieve the opposite, simply implement __ne__, for not equal. Python is actually smart enough to fall back to not __eq__, so if your logic is reversed (as it really should be), there’s no need to write it yourself. The case is far more interesting with the other comparison operators:

>>> class A:
... def __init__(self, x):
... self.x = x
... def __gt__(self, other):
... return isinstance(other, A) and self.x > other.x
>>> a1 = A(1)
>>> a2 = A(2)
>>> a1 > a2
False
>>> a2 > a1
True

Alas, a new caveat arises. What if we define a class that is always less than anything else; let’s call it Epsilon—

>>> class Epsilon:
... def __lt__(self, other):
... return True

In that case, this works:

>>> e = Epsilon()
>>> e < a1
True

Because it calls e.__lt__(a1), which always returns True; but this doesn’t:

>>> a1 > e
False

Because it calls a1.__gt__(e), which notices e is not an instance of A and returns False. While it’s OK for __eq__ to return False when the object type is different, because it clearly implies the object is not equal to self—this isn’t the case with the other comparison operators, which require a third option: NotImplemented; that is to say—I don’t know. The reason this is so much better than a definitive “no”, is that it gives Python a chance to go and ask the other party involved—maybe its reverse operator would shed some light on the problem. And if neither has any idea, Python will raise a standard TypeError on their behalf, which is much more informative than a decisive refusal. Here’s the proper implementation:

>>> class A:
... ... # Same as before
... def __gt__(self, other):
... if not isinstance(self, A):
... return NotImplemented
... return self.x > other.x

In this case…

>>> a1 = A(1)
>>> a1 > e
True

…works! a1.__gt__(e) is still invoked first, but since it returns NotImplemented, Python goes ahead and tries e.__lt__(a1), which is kind enough to return True. This technique is relevant to all the comparison operators:

>>> class A:
... def __gt__(self, other) # > greater than
... def __lt__(self, other) # < less than
... def __ge__(self, other) # >= greater than or equal to
... def __le__(self, other) # <= less than or equal to

One last word of caution—for some reason, the Python developers decided not to extrapolate whatever comparison function you define into what’s called a “total order”. The opposite of > is ≤, right? Well:

>>> class A:
... def __init__(self, x):
... self.x = x
... def __gt__(self, other):
... return self.x > other.x
>>> a1 = A(1)
>>> a2 = A(2)
>>> a2 > a1
True
>>> a1 <= a2
Traceback (most recent call last):
...
TypeError: '>=' not supported between instances of 'A' and 'A'

You can still achieve the desired effect rather easily, by using the total_ordering decorator in the standard functools module. All you need to do is define __eq__ and one other comparison operator, and it’ll infer the rest using the powers of math:

>>> import functools
>>> @functools.total_ordering
... class A:
... def __init__(self, x):
... self.x = x
... def __eq__(self, other):
... return self.x == other.x
... def __gt__(self, other):
... return self.x > other.x
>>> a1 = A(1)
>>> a2 = A(2)
>>> a1 <= a2
True

If you’re weirded out by seeing a decorator on top of a class, that’s alright; we’ll get there shortly. Maybe not that shortly, seeing as we’ve only covered display, equality and comparisons—but we’ll get there.

Conclusion

Objects are everywhere—and in Python, I mean it quite literally. Understanding them is really mostly about understanding their type, which defines their behaviors, and is encoded in a class. We saw how “magic methods”, starting and ending with a double-underscore (or “duner”, for short) let us customize how our objects behave in certain circumstances—and we’ll learn many more as we push forward.

Next time we’ll cover exotic behaviors such as arithmetics, invocation, indexing and iteration, and then on to the holy grail — attributes and method resolution.

This article was originally published by Dan gittik on medium.

Upvote


user
Created by

Dan Gittik


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles