Many people given the task of managing data reach for the tools available to them on their office computer. Typically this will include the Microsoft suite of products including the Access database. Not surprisingly, Microsoft Access Database files, “.mdb” files, are difficult to work with if you don’t have Access on your system. (Or even if you do!)
In this post we describe how to use freely available tools and and 14 lines of python code to automatically dump the contents of an entire Access database into comma-separated-value files, CSV files, one per table.
Recently we began a project that involved data stored in Microsoft Access databases. These “.mdb” files use the JET database engine which is today considered a deprecated technology even by Microsoft. Our first thought was to reach for an older PC and simply open and export database tables from within Access. This is certainly doable but requires 12 mouse clicks for each and every table in the database. No doubt one could become more efficient by learning how to program macros in Access but this is not a technology we hope to become expert at.
It would be much better to have a technique that worked on our system of choice: an IBM Thinkpad running Ubuntu Linux.
A little Googling turned up the MDBTools project that provides a set of libraries and utilities for working with .mdb files. Installation was a snap on Ubuntu:
- Go to “System > Administration > Synaptic Package Manager” and search for “mdb”
- Select “mdbtools” (and “mdbtools-gmdb” for the viewier)
- Click “Install”
The tools you are most likely to use are:
| Name | Description |
|---|---|
| mdb-tables | list tables in the specified file |
| mdb-schema | generate schema DDL for the specified file |
| mdb-export | generate CSV style output for a table |
Check the man pages to learn more about options associated with each.
The next stage is to write a little program that gets a list of tables from mdb-tables, loops through the list and then uses mdb-export to dump the contents of each table into a CSV file. We always reach for python to do this kind of work as it is concise, readable and very widely adopted in the scientific community. A serious python programmer would of course suck all of the database contents into python and do something within the python process space. But here we are just interested in dumping the data.
Here is our complete program:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
#!/usr/bin/env python # # AccessDump.py # A simple script to dump the contents of a Microsoft Access Database. # It depends upon the mdbtools suite: # http://sourceforge.net/projects/mdbtools/ import sys, subprocess # the subprocess module is new in python v 2.4 DATABASE = sys.argv[1] # Get the list of table names with "mdb-tables" table_names = subprocess.Popen(["mdb-tables", "-1", DATABASE], stdout=subprocess.PIPE).communicate()[0] tables = table_names.split('n') # Dump each table as a CSV file using "mdb-export", # converting " " in table names to "_" for the CSV filenames. for table in tables: if table != '': filename = table.replace(" ","_") + ".csv" file = open(filename, 'w') print("Dumping " + table) contents = subprocess.Popen(["mdb-export", DATABASE, table], stdout=subprocess.PIPE).communicate()[0] file.write(contents) file.close() |
The code assumes that the mdbtools suite has been previously installed.
All you need to do to run it is:
- cut and paste the python code to a file named AccessDump.py in the same directory as your .mdb files
- chmod AccessDump.py +x
- run it from the command line, passing the name of your .mdb file as an argument
Now you should have a set of CSV files to work with in whatever system you want. Your data have been liberated!
As a mathematician once told us, describing one of his proofs: “It’s not hard once you know how to do it.”
Just what I was looking for! I found you via http://www.terragis.net/2009/09/14/unlocking-data-trapped-in-mdb-files-on-linux/.
I had already found mdbtools and your python solution to dump everything to csv was a big help in getting my mdb database exported into a usable format.
This is awesome. I have a huge mdb file with several tables, and woud be tedious to export one by one with mdbtools. I’m going to test the code, even I’d never use python. Is a good time to implement a “Hello World” program to begin with python.
Thanks a lot!
Found this via Google and it worked brilliantly. Thanks!
I found this page, and was filled with hope…. but I get this:
Traceback (most recent call last):
File “AccessDump”, line 11, in
stdout=subprocess.PIPE).communicate()[0]
File “/usr/lib/python2.7/subprocess.py”, line 679, in __init__
errread, errwrite)
File “/usr/lib/python2.7/subprocess.py”, line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Please can you help?
I assume that “No such file or directory” in the error message means that the argument you are passing to the script is not a valid file. Perhaps you should try using an absolute path name?
I get the following error:
Python wealth.mdb AccessDump.py
File "wealth.mdb", line 1
SyntaxError: Non-ASCII character '\x93' in file wealth.mdb on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
Would this be because the .MDB file I have is a compiled application? I am clueless to anything Windows…
Your mdb file has the unicode character ‘\x93′ aka ‘U+0093′. If you google on ‘U+0093′ you will find that U+0093 is the “” character and was previously known as “SET TRANSMIT STATE”. No clue what it’s doing in your .MDB file but you will definitely need to educate python about the encoding used for your MDB file. I would try using the UTF-8 encoding. Either that or try writing some code to replace all ‘\x93′ with blanks in your .MDB file. Read up on python encodings for more information.