Duck Typing, Division and Importing from the Future

If you already know what this post is about from the title then you can stop right here. Read no further.  Quit wasting time and get back to work. … But if your curiosity is piqued or if you ever use the python programming language then you should continue on.

Duck Typing

Compiled programming languages like Fortran, C, C++ and Java are statically typed. Each and every variable must be of a particular type such as int, float, double, complex, character, etc.  Types must be specified in your code so that the type checking needed to guarantee successful execution of the program can occur at compile time. This is in contrast to scripted languages like Perl, Python and Ruby which are dynamically typed — where variables do not need an assigned type within the code. In dynamically typed languages type checking occurs at run-time. A more complete exposition of these ideas is given in the Wikipedia page on Type Systems.

Our favorite programming language is dynamically-typed Python due to its flexibility, the readability of Python code and the fact that hundreds of modules exist that specifically address the needs of scientific data management.  The Python version of dynamic typing is lovingly called “duck typing” and is described in the Python glossary as:

Duck Typing — A programming style which does not look at an object’s type to determine if it has the right interface; instead, the method or attribute is simply called or used. (“If it looks like a duck and quacks like a duck, it must be a duck.”) By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution.

It makes one feel intelligent, reading an article that uses the the phrase “polymorphic substitution” … but what the heck does it mean? In simple terms it means that you don’t need to specify the type of the variables in your code.  You simply let python’s “duck typing” guess whether it’s an int or float or str and hope that it complains if there is a problem in the way you use your variables.  The end result is that your code will be much easier to read and errors will be called to your attention at run time. A couple of examples are in order:

Example 1) Python can “add” but cannot multiply two strings.

>>> a = "00"
>>> b = "7"
>>> a + b
'007'
>>> a * b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'

Example 2) Python cannot “add” but can multiply a string and an integer.

>>> a = "00"
>>> b = 7
>>> a + b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects
>>> a * b
'00000000000000'

Example 3) Python can add or multiply two integers.

>>> a = 00
>>> b = 7
>>> a + b
7
>>> a * b
0

Division

While Python’s “duck typing” works well and is quite intuitive most of the time there is one glaring case where the results are anything but intuitive — division. Not surprisingly, division with string arguments results in a TypeError while division with integers and floats does not.  The problem arises when “polymorphic substitution” changes the result of the division operator depending on whether the arguments are integer or float.

The default behavior of the division operator in Python 2.x is the following:

  • if either dividend or divisor is a float, return a floating point result
  • if both dividend and divisor are int, return the integer component of the result

This can lead to some surprising results:

>>> a = 3.0; b = 4.0; a/b
0.75
>>> a = 3; b = 4.0; a/b
0.75
>>> a = 3; b = 4; a/b
0
>>> 4 == 4.0
True

We have said many times that we like Python because it allows us to write clear, intuitive code.  We all know what we mean when we write “3/4” and when a programming languages reports back that “3/4 = 0” it is anything but intuitive.  This is a case where “polymorphic substitution” needs to be reigned in by a little pragmatism. Only the hardest of hard-core programming language purists could believe that this behavior is a good idea.

In a complex piece of dynamically typed code it is far too easy for this behavior to lead to intermittent mistakes and almost-correct results.  Imagine a Celsius to Fahrenheit converter that accepted user input:

>>> user_input = 18.0
>>> print(user_input * 9/5 + 32)
64.4
>>> user_input = 18
>>> print(user_input * 9/5 + 32)
64

Ouch!!!  This is the kind of close-but-not-quite result that is devilishly difficult to debug.

Importing from the Future

Luckily, there are many pragmatists in the Python community.  This problem with division was identified early on and the default behavior of division in Python 3.x will be the sane one:  division with numeric arguments will always return a floating point value. Python Enhancement Proposal PEP 238 covers the issue. Unfortunately, widespread adoption of Python 3.0 is still a ways off.  In the mean time what we have to do is import the desired behavior from Python 3.0 into our existing code.

Since Python version 2.1, a __future__ module has been available that allows whatever version of python you are running to import behavior that is being developed for future releases.  This has the advantage of giving you access to new features like sane division but also helps you guarantee that the code you are writing will be compatible with future versions of Python when they come out.

By importing from the __future__ module as the first line of any program we write, we get the improved behavior of the division operator:

>>> from __future__ import division
>>> user_input = 18.0
>>> print(user_input * 9/5 + 32)
64.4
>>> user_input = 18
>>> print(user_input * 9/5 + 32)
64.4

Kudos to the Python community for coming up with an elegant solution for selectively modifying the default behavior of the programming language.

WARNING: Borrowing from the future only works for the Python programming language. Government budgets should not begin with from __future__ import $$“.

Decimal Math

Before closing we should also mention the decimal module that deals with “schoolbook math” and the idea that the number of digits after a decimal point reflects the precision of a measurement and that this information should be retained. A couple of examples will suffice to give the flavor of the module:

>>> a = 1.33
>>> b = 1.27
>>> a + b
2.6000000000000001
>>> a * b
1.6891
>>> print(a + b)
2.6
>>> print(a * b)
1.6891
>>> from decimal import *
>>> a = Decimal('1.33')
>>> b = Decimal('1.27')
>>> a + b
Decimal('2.60')
>>> a * b
Decimal('1.6891')
>>> print(a + b)
2.60
>>> print(a * b)
1.6891
>>> getcontext().prec = 4
>>> print(a * b)
1.689
>>> getcontext().prec = 3
>>> print(a * b)
1.69

TODO:

The decimal module is clearly not for everyday use — internal calculations do not need to keep track of “schoolbook math” and the syntactic overhead is pretty ugly. But we imagine this module could be extremely helpful in the context of high school or college level chemistry and physics.

An understanding of precision and the importance of significant figures is sorely lacking in the software tools used in science today. What if someone created an on-line spreadsheet (or lab book) for use in college chemistry classes that forced students to identify the precision of their measurements when entering them and always took care of the significant digits when doing basic math?  Options to change the precision associated with a particular measurement would appropriately modify the contents of any cells using that measurement, visually calling attention to modified cells. This could be a great way for students to learn about measurement precision and significant digits and why it is important to keep track of both.

Let’s change the way software works to meet the needs of science instead of the other way around.

This entry was posted in Toolbox and tagged . Bookmark the permalink.

Comments are closed.