Day in the life of the MBTA system

Combine that with the code to get the DDL for constraints and you have a tool you can use to defer index construction like this:

Here’s the whole module:

When bulk copying data to a table, oncology it is much faster if the destination table is index and constraint free, because it is cheaper to build an index once than maintain it over many inserts. For postgres, the pg_restore and SQL COPY commands can do this, but they both require that data be copied from the filesystem rather than directly from another table.

For table to table copying (and transformations) the situation isn’t as straight-forward. Recently I was working on a problem where we needed to perform some poor-man’s ETL, copying and transforming data between tables in different schemas. Since some of the destination tables were heavily indexed(including a full text index) the task took quite a while. In talking with a colleague about the problem, we came up with the idea of dropping the indexes and constraints prior to the data load, and restoring them afterwards.

First stop: how to get the DDL for indices on a table in postgres? Poking around the postgres catalogs, I managed to find a function pg_get_indexdef that would return the DDL for an index. Combining that with a query I found in a forum somewhere and altered, I came up with this query to get the names and DDL of all the indices on a table. (this one excludes the primary key index)

With that and the query to do the same for constraints its straightforward to build a helper function that will get the DDL for all indices and constraints, drop them, yield to evaluate a block and then restore the indices and constraints. The method is below:

Use of the function would look like the snippet below. This solution would also allow for arbitrarily complex transformatio

For my task loading and transforming data into about 20 tables, doing this reduced the execution time by two-thirds. Of course, your mileage may vary depending how heavily indexed your destination tables are.

Here’s the whole module:

I’m James Kebinger, more info currently a Software Engineer at PatientsLikeMe.
I’m an experienced Software Engineer and Web Developer with a variety of skills including Java and Ruby/Ruby on Rails and interests including usability and data visualization. I recently got a Master’s degree in Computer Science from Tufts University, and I’m determined to o
I’m James Kebinger, find currently a Master’s candidate in Computer Science at Tufts University in Medford, website like this MA.
I’m also an experienced Software Engineer and Web Developer with a variety of skills including Java and Ruby/Ruby on Rails.

I put together an animation of all the rail traffic in the course of a day on the MBTA’s red, online blue, green and orange lines, including the Mattapan line. Its a great way to see just how complicated the system is that takes me to work every day, and perhaps be a little more patient next time things go less than perfect!

The current version of the animation assumes stop take no time (as does the scheduling data).

I’d thought about doing this before, but it would have taken screen scraping schedule information off the site. I learned recently through a developer outreach that the Massachusetts Department of Transportation is running that the MBTA had released their schedule information in the Google transit feed specification (GTFS). With the data in hand, I went to work using the ruby-processing wrapper of the excellent Processing graphics toolkit.

See the video on youtube

5 thoughts on “Day in the life of the MBTA system”

  1. What might it look like if you estimated every stop to be x seconds long? (I honestly have no idea what the average time stopped is).

Leave a Reply

Your email address will not be published. Required fields are marked *