Liberating data from Microsoft Access “.mdb” files

Many people given the task of managing data reach for the tools available to them on their office computer.  Typically this will include the Microsoft suite of products including the Access database.  Not surprisingly, Microsoft Access Database files, “.mdb” files, are difficult to work with if you don’t have Access on your system.  (Or even if you do!)

In this post we describe how to use freely available tools and and 14 lines of python code to automatically dump the contents of an entire Access database into comma-separated-value files, CSV files, one per table.

Recently we began a project that involved data stored in Microsoft Access databases.  These “.mdb” files use the JET database engine which is today considered a deprecated technology even by Microsoft.  Our first thought was to reach for an older PC and simply open and export database tables from within Access.  This is certainly doable but requires 12 mouse clicks for each and every table in the database.  No doubt one could become more efficient by learning how to program macros in Access but this is not a technology we hope to become expert at.

It would be much better to have a technique that worked on our system of choice:  an IBM Thinkpad running Ubuntu Linux.

A little Googling turned up the MDBTools project that provides a set of libraries and utilities for working with .mdb files.  Installation was a snap on Ubuntu:

  1. Go to “System > Administration > Synaptic Package Manager” and search for “mdb”
  2. Select “mdbtools” (and “mdbtools-gmdb” for the viewier)
  3. Click “Install”

The tools you are most likely to use are:

Name Description
mdb-tables list tables in the specified file
mdb-schema generate schema DDL for the specified file
mdb-export generate CSV style output for a table

Check the man pages to learn more about options associated with each.

The next stage is to write a little program that gets a list of tables from mdb-tables, loops through the list and then uses mdb-export to dump the contents of each table into a CSV file.  We always reach for python to do this kind of work as it is concise, readable and very widely adopted in the scientific community.  A serious python programmer would of course suck all of the database contents into python and do something within the python process space.  But here we are just interested in dumping the data.

Here is our complete program:

#!/usr/bin/env python
#
# AccessDump.py
# A simple script to dump the contents of a Microsoft Access Database.
# It depends upon the mdbtools suite:
#   http://sourceforge.net/projects/mdbtools/

import sys, subprocess # the subprocess module is new in python v 2.4

DATABASE = sys.argv[1]

# Get the list of table names with "mdb-tables"
table_names = subprocess.Popen(["mdb-tables", "-1", DATABASE], 
                               stdout=subprocess.PIPE).communicate()[0]
tables = table_names.split('\n')

# Dump each table as a CSV file using "mdb-export",
# converting " " in table names to "_" for the CSV filenames.
for table in tables:
    if table != '':
        filename = table.replace(" ","_") + ".csv"
        file = open(filename, 'w')
        print("Dumping " + table)
        contents = subprocess.Popen(["mdb-export", DATABASE, table],
                                    stdout=subprocess.PIPE).communicate()[0]
        file.write(contents)
        file.close()

The code assumes that the mdbtools suite has been previously installed.

All you need to do to run it is:

  1. cut and paste the python code to a file named AccessDump.py in the same directory as your .mdb files
  2. chmod AccessDump.py  +x
  3. run it from the command line, passing the name of your .mdb file as an argument

Now you should have a set of CSV files to work with in whatever system you want.  Your data have been liberated!

As a mathematician once told us, describing one of his proofs:  “It’s not hard once you know how to do it.”

This entry was posted in Data Management and tagged , , . Bookmark the permalink.

9 Responses to Liberating data from Microsoft Access “.mdb” files

  1. Michael says:

    Just what I was looking for! I found you via http://www.terragis.net/2009/09/14/unlocking-data-trapped-in-mdb-files-on-linux/.

    I had already found mdbtools and your python solution to dump everything to csv was a big help in getting my mdb database exported into a usable format.

  2. Cassiano says:

    This is awesome. I have a huge mdb file with several tables, and woud be tedious to export one by one with mdbtools. I’m going to test the code, even I’d never use python. Is a good time to implement a “Hello World” program to begin with python.

    Thanks a lot!

  3. Allen says:

    Found this via Google and it worked brilliantly. Thanks!

  4. Robin says:

    I found this page, and was filled with hope…. but I get this:
    Traceback (most recent call last):
    File “AccessDump”, line 11, in
    stdout=subprocess.PIPE).communicate()[0]
    File “/usr/lib/python2.7/subprocess.py”, line 679, in __init__
    errread, errwrite)
    File “/usr/lib/python2.7/subprocess.py”, line 1249, in _execute_child
    raise child_exception
    OSError: [Errno 2] No such file or directory
    Please can you help?

    • Jonathan Callahan says:

      I assume that “No such file or directory” in the error message means that the argument you are passing to the script is not a valid file. Perhaps you should try using an absolute path name?

  5. Ben says:

    I get the following error:


    Python wealth.mdb AccessDump.py
    File "wealth.mdb", line 1
    SyntaxError: Non-ASCII character '\x93' in file wealth.mdb on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

    Would this be because the .MDB file I have is a compiled application? I am clueless to anything Windows…

    • Jonathan Callahan says:

      Your mdb file has the unicode character ‘\x93’ aka ‘U+0093’. If you google on ‘U+0093’ you will find that U+0093 is the “” character and was previously known as “SET TRANSMIT STATE”. No clue what it’s doing in your .MDB file but you will definitely need to educate python about the encoding used for your MDB file. I would try using the UTF-8 encoding. Either that or try writing some code to replace all ‘\x93’ with blanks in your .MDB file. Read up on python encodings for more information.

    • Jonathan Callahan says:

      My comment above missed the obvious mistake — putting the file before the python script on the command line. The correct order would be:

      python AccessDump.py wealth.mdb

  6. nacnudus says:

    Line 15 should be

    tables = table_names.split(‘\n’)

    Thanks for the script!