Many people given the task of managing data reach for the tools available to them on their office computer. Typically this will include the Microsoft suite of products including the Access database. Not surprisingly, Microsoft Access Database files, “.mdb” files, are difficult to work with if you don’t have Access on your system. (Or even if you do!)
In this post we describe how to use freely available tools and and 14 lines of python code to automatically dump the contents of an entire Access database into comma-separated-value files, CSV files, one per table.
Recently we began a project that involved data stored in Microsoft Access databases. These “.mdb” files use the JET database engine which is today considered a deprecated technology even by Microsoft. Our first thought was to reach for an older PC and simply open and export database tables from within Access. This is certainly doable but requires 12 mouse clicks for each and every table in the database. No doubt one could become more efficient by learning how to program macros in Access but this is not a technology we hope to become expert at.
It would be much better to have a technique that worked on our system of choice: an IBM Thinkpad running Ubuntu Linux.
A little Googling turned up the MDBTools project that provides a set of libraries and utilities for working with .mdb files. Installation was a snap on Ubuntu:
- Go to “System > Administration > Synaptic Package Manager” and search for “mdb”
- Select “mdbtools” (and “mdbtools-gmdb” for the viewier)
- Click “Install”
The tools you are most likely to use are:
|mdb-tables||list tables in the specified file|
|mdb-schema||generate schema DDL for the specified file|
|mdb-export||generate CSV style output for a table|
Check the man pages to learn more about options associated with each.
The next stage is to write a little program that gets a list of tables from mdb-tables, loops through the list and then uses mdb-export to dump the contents of each table into a CSV file. We always reach for python to do this kind of work as it is concise, readable and very widely adopted in the scientific community. A serious python programmer would of course suck all of the database contents into python and do something within the python process space. But here we are just interested in dumping the data.
Here is our complete program:
#!/usr/bin/env python # # AccessDump.py # A simple script to dump the contents of a Microsoft Access Database. # It depends upon the mdbtools suite: # http://sourceforge.net/projects/mdbtools/ import sys, subprocess # the subprocess module is new in python v 2.4 DATABASE = sys.argv # Get the list of table names with "mdb-tables" table_names = subprocess.Popen(["mdb-tables", "-1", DATABASE], stdout=subprocess.PIPE).communicate() tables = table_names.split('\n') # Dump each table as a CSV file using "mdb-export", # converting " " in table names to "_" for the CSV filenames. for table in tables: if table != '': filename = table.replace(" ","_") + ".csv" file = open(filename, 'w') print("Dumping " + table) contents = subprocess.Popen(["mdb-export", DATABASE, table], stdout=subprocess.PIPE).communicate() file.write(contents) file.close()
The code assumes that the mdbtools suite has been previously installed.
All you need to do to run it is:
- cut and paste the python code to a file named AccessDump.py in the same directory as your .mdb files
- chmod AccessDump.py +x
- run it from the command line, passing the name of your .mdb file as an argument
Now you should have a set of CSV files to work with in whatever system you want. Your data have been liberated!
As a mathematician once told us, describing one of his proofs: “It’s not hard once you know how to do it.”