Deferring Index costs for table to table copies in PostgreSQL

When bulk copying data to a table, it is much faster if the destination table is index and constraint free, because it is cheaper to build an index once than maintain it over many inserts. For postgres, the pg_restore and SQL COPY commands can do this, but they both require that data be copied from the filesystem rather than directly from another table.

For table to table copying (and transformations) the situation isn’t as straight-forward. Recently I was working on a problem where we needed to perform some poor-man’s ETL, copying and transforming data between tables in different schemas. Since some of the destination tables were heavily indexed(including a full text index) the task took quite a while. In talking with a colleague about the problem, we came up with the idea of dropping the indexes and constraints prior to the data load, and restoring them afterwards.

First stop: how to get the DDL for indices on a table in postgres? Poking around the postgres catalogs, I managed to find a function pg_get_indexdef that would return the DDL for an index. Combining that with a query I found in a forum somewhere and altered, I came up with this query to get the names and DDL of all the indices on a table. (this one excludes the primary key index)

With that and the query to do the same for constraints its straightforward to build a helper function that will get the DDL for all indices and constraints, drop them, yield to evaluate a block and then restore the indices and constraints. The method is below:

Use of the function would look like the snippet below. This solution would also allow for arbitrarily complex transformations in Ruby as well as pure SQL.

For my task loading and transforming data into about 20 tables, doing this reduced the execution time by two-thirds. Of course, your mileage may vary depending how heavily indexed your destination tables are.

Here’s the whole module:

My experience setting up a MOCA network at home

ING Direct put up a short manifesto titled “We, The Savers“. Its a good read,
and we could all do better by it.

Number 3 struck me especially:

We will take care of our money. It’s not enough to have money in a bank. We will put it where it will grow. We’ll keep track of it. And we’ll check every account we have every year to protect ourselves against fraud or escheatment.

“We will put it where it will grow” – well where will it grow. It seems the first tool brought to bear on any stock market bump is to lower interest rates,
which in effect punishes those of us who do actually have money in a savings account. We lament the low savings rate in America, but then we go make it more appealing to borrow and less appealing to save.

Another item is this – not everyone has the internet access or savvy to move their money to a place like ING Direct. Those people have their money stuck in a savings account that probably pays well under one-percent interest. I think its high time this country had a better program to get more people online so people can get away from their no-interest paying bank.
I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org based ethernet over coax network if you’re in need of more speed than wireless will give you, but don’t expect miracles on the Tivo front.

Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

Taking the plunge

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires. Plug another one in somewhere else in your house, wall to the in port, and ethernet to some device and you’re in business. I got north of 80mbps between two laptops over the coax bridge.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

Taking control

I’d recommend spending a bit of time to make your new bridges configurable- they have web interfaces, its just a matter of getting to them that’s tricky. I pieced together this information from several sources on the web.
The first problem is getting into the web interface. The default settings are for the bridge to auto assign itself an IP address in the range 169.254.1.x , and it won’t accept admin connections from devices that aren’t on the same ip range so here’s what you do:

  1. Take a computer and set your ethernet interface to have a static IP address of 169.254.1.100
  2. Connect the computer directly to the bridge over ethernet
  3. Goto http://169.254.1.1 . If that doesn’t work, increment the last digit until it does
  4. When you see the web interface, the default password is “entropic” – they’re apparently the only people who make the chips for these devices

Once you’re in the configuration works much like any other network device. You should definitely set a new password under “coax security” – you’ll have to repeat this for all your devices. Also, I’d recommend setting the device to use DHCP or a fixed IP in your usual IP range if you’d like to change anything in the future.

We the savers

ING Direct put up a short manifesto titled ”
The Savers”. Its a good read,
and we could all do better by it.

Number 3 struck me especially:

We will take care of our money. It’s not enough to have money in a bank. We will put it where it will grow. We’ll keep track of it. And we’ll check every account we have every year to protect ourselves against fraud or escheatment.

“We will put it where it will grow” – well where will it grow. It seems the first tool brought to bear on any stock market bump is to lower interest rates, which in effect punishes those of us who do actually have money in a savings account. We lament the low savings rate in America, but then we go make it more appealing to borrow and less appealing to save by dropping rates again and again.

Another item is this – not everyone has the internet access or savvy to move their money to a place like ING Direct. Those people have their money stuck in a savings account that probably pays well under one-percent interest. I think its high time this country had a better program to get more people online so people can get away from their no-interest paying banks.

Creating thumbnails of PDFs with attachment_fu

We needed to create some thumbnails from uploading PDF files for a new site feature – We’re using attachment_fu which doesn’t support that (yet?), but we’re using RMagick as our processor and it understands PDF files.

I came up with the hack below (warning, first draft, only briefly tested) which works without having to modify the attachment_fu plugin itself. One day I’ll loop back and figure out a cleaner way to do this and see which of attachment_fu’s other image processors can even support pdfs.

There are three methods to override to make a go of this:

  1. self.image? : consider pdf files as an image so thumbnail process will happen
  2. thumbnail_name_for : change the extension of the saved thumbnail filename to png
  3. resize_image: override to change format via block passed to to_blob

Apologies for the crappy source formatting, I have to install a plugin to do that well one of these days


###Hacks to allow creation of png thumbnails for pdf uploads - depends on RMagic being the configured processor
## likely very fragile

def self.image?(content_type)
(content_types +  ['application/pdf']).include?(content_type)
end

alias_method :original_thumbnail_name_for, :thumbnail_name_for
def thumbnail_name_for(thumbnail=nil)
return original_thumbnail_name_for(thumbnail) unless (content_type == 'application/pdf' && !thumbnail.blank?)
basename = filename.gsub /.w+$/ do |s|
ext = s; ''
end
"#{basename}_#{thumbnail}.png"
end
#copied from rmagick_processor with change in last few lines
def resize_image(img, size)
size = size.first if size.is_a?(Array) && size.length == 1 && !size.first.is_a?(Fixnum)
if size.is_a?(Fixnum) || (size.is_a?(Array) && size.first.is_a?(Fixnum))
size = [size, size] if size.is_a?(Fixnum)
img.thumbnail!(*size)
else
img.change_geometry(size.to_s) { |cols, rows, image| image.resize!(cols<1 ? 1 : cols, rows<1 ? 1 : rows) }
end
img.strip! unless attachment_options[:keep_profile]
if content_type == 'application/pdf' # here force the output format to PNG if its a pdf
self.temp_path = write_to_temp_file(img.to_blob {self.format = 'PNG'})
else
self.temp_path = write_to_temp_file(img.to_blob)
end
end

Saving money on train tickets in the UK

Skip raileurope.com if you’re in the US and buying rain tickets for travel within the UK. Their prices are outrageously marked up.  For a Saturday, second class ticket from Newcastle to Edinburgh in October, their quote is $96.00. Get a quote from one of the many web sites based in the UK, and its 9 and a half Pounds. That’s about $17.

The problem is many sites in the UK don’t let you purchase tickets with a non-UK billing address. If you’re going to be at a major station where you can pick up tickets from a machine, check out thetrainline.com. They’re cool with US billing addresses, and much cheaper as well.

Livepipe UI controls for JS

I’ve been pretty pleased so far with the popup window control from LivePipe. It plays nice with Prototype and is easy to style with regular CSS. We had considered using Prototype Window but I was put off that all their default styles looks like operating system windows and restyling their windows required a table and 9 images.

I’d recommend anyone looking for a popup window solution at least consider Livepipe. There are downsides however, chiefly that the project is pretty immature – technically I suppose this is an alpha release since Beta One is being worked on, so the community remains small. While there are some folks already submitting patches, progress on merging the patches is alarmingly slow, as one can see from their lighthouse page.

If you’re doing RESTful stuff in Rails however, you will need the contents of ticket #10 which modifies the popup window to accept an option to use different HTTP verbs.

Scriptaculous docs

Every now and then my iPhone has this issue where it can’t tell time properly. I wake it up, and it shows me a time several hours ago, then as if waking from a drunken stupor, slowly tries to catch up to reality, moving the clock forward by a small, random number of minutes. During these episodes the whole UI is sluggish, and it apparently doesn’t even accept phone calls. When “phone” is 5/6 of your name one would think at least that would work all of the time!

Check out this screenshot from the missed call sheet. It recorded 3 missed calls that arrived over the course of an afternoon all with the exact same arrival time, 9:40 AM. The phone never rang.

That was with v2.01, so I sure hope this is fixed in the future.
For all the complaining I often do about the poor documentation of the scriptaculous project, I finally did something to help that today, creating (very thin) documentation for their new (if released in January is new) Effect.Tween function here on their github wiki.

I was creating a method to scroll the viewport so that the contents of an AJAX-loaded div would be fully visible on the screen – the (still undocumented) Effect.ScrollTo doesn’t quite do it because it doesn’t consider the height of the element it scrolls to, but in doing so I stumbled over Tween in the code. Once the math to figure out how much scrolling is needed, its easy to use Effect.Tween to smoothly scroll the window by repeatedly calling window.scrollBy();

This certainly isn’t rocket science, but here’s an outline of how to do it (this code only deals with downward vertical scrolling):

var elementHeight = element.getHeight();
var screenHeight = document.viewport.getHeight();
var elementScreenPos =element.viewportOffset()[1];
var amountToScroll = elementHeight - (screenHeight - elementScreenPos);
if (amountToScroll > 0){
var scrollPos = document.viewport.getScrollOffsets().top;
new Effect.Tween(null,scrollPos,scrollPos+amountToScroll,{},function(n) { window.scrollTo(0,n);});
}
}

iPhone can’t keep time

I think the New York Times application for the iPhone is pretty good. My chief complaint (other than the random crashing) is that the head short
I think the New York Times application for the iPhone is pretty good. My chief complaint (other than the random crashing) is that the head short
Every now and then my iPhone has this issue where it can’t tell time properly. I wake it up, and it shows me a time several hours ago, then as if waking from a drunken stupor, slowly tries to catch up to reality, moving the clock forward by a small, random number of minutes. During these episodes the whole UI is sluggish, and it apparently doesn’t even accept phone calls. When “phone” is 5/6 of your name one would think at least that would work all of the time!

Check out this screenshot from the missed call sheet. It recorded 3 missed calls that arrived over the course of an afternoon all with the exact same arrival time, 9:40 AM. The phone never rang.

That was with v2.01, so I sure hope this is fixed in the future.

Update: to Frank’s comment – this wasn’t a matter of the phone bouncing between time zones. The phone’s time isn’t a whole number of hours behind.