Getting Wukong and Pig Working Together on Amazon Elastic Map Reduce

If you have a table with a column included as the first column in a multi-column index and then again with it’s own index, site you may be over indexing. Postgres will use the multi-column index for queries on the first column. First a pointer to the postgres docs that I can never find, check and then data on performance of multi-column indexes vs single.

From the docs

A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.


Performance

If you click around that section of the docs, you’ll surely come across the section on multi-column indexing and performance, in particular this section (bold emphasis mine):

You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone

Life is full of tradeoffs performance wise, so we should explore just how much slower it is to use a multi-column index for single column queries.

First, lets create a dummy table:

CREATE TABLE foos_and_bars
(
id serial NOT NULL,
foo_id integer,
bar_id integer,
CONSTRAINT foos_and_bars_pkey PRIMARY KEY (id)
)

Then, using R, we’ll create 3 million rows of nicely distributed data:

rows = 3000000
foo_ids = seq(1,250000,1)
bar_ids = seq(1,20,1)
data = data.frame(foo_id = sample(foo_ids, rows,TRUE), bar_id= sample(bar_ids,rows,TRUE))

Dump that to a text file and load it up with copy and we’re good to go.

Create the compound index

CREATE INDEX foo_id_and_bar_id_index
ON foos_and_bars
USING btree
(foo_id, bar_id);

Run a simple query to make sure the index is used:

test_foo=# explain analyze select * from foos_and_bars where foo_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foos_and_bars  (cost=4.68..55.74 rows=13 width=12) (actual time=0.026..0.038 rows=8 loops=1)
Recheck Cond: (foo_id = 123)
->  Bitmap Index Scan on foo_id_and_bar_id_index  (cost=0.00..4.68 rows=13 width=0) (actual time=0.020..0.020 rows=8 loops=1)
Index Cond: (foo_id = 123)
Total runtime: 0.072 ms
(5 rows)

If you have a table with a column included as the first column in a multi-column index and then again with it’s own index, misbirth you may be over indexing. Postgres will use the multi-column index for queries on the first column. First a pointer to the postgres docs that I can never find, ed and then data on performance of multi-column indexes vs single.

From the docs

A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.


Performance

If you click around that section of the docs, you’ll surely come across the section on multi-column indexing and performance, in particular this section (bold emphasis mine):

You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone

Life is full of tradeoffs performance wise, so we should explore just how much slower it is to use a multi-column index for single column queries.

First, lets create a dummy table:

CREATE TABLE foos_and_bars
(
id serial NOT NULL,
foo_id integer,
bar_id integer,
CONSTRAINT foos_and_bars_pkey PRIMARY KEY (id)
)

Then, using R, we’ll create 3 million rows of nicely distributed data:

rows = 3000000
foo_ids = seq(1,250000,1)
bar_ids = seq(1,20,1)
data = data.frame(foo_id = sample(foo_ids, rows,TRUE), bar_id= sample(bar_ids,rows,TRUE))

Dump that to a text file and load it up with copy and we’re good to go.

Create the compound index

CREATE INDEX foo_id_and_bar_id_index
ON foos_and_bars
USING btree
(foo_id, bar_id);

Run a simple query to make sure the index is used:

test_foo=# explain analyze select * from foos_and_bars where foo_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foos_and_bars  (cost=4.68..55.74 rows=13 width=12) (actual time=0.026..0.038 rows=8 loops=1)
Recheck Cond: (foo_id = 123)
->  Bitmap Index Scan on foo_id_and_bar_id_index  (cost=0.00..4.68 rows=13 width=0) (actual time=0.020..0.020 rows=8 loops=1)
Index Cond: (foo_id = 123)
Total runtime: 0.072 ms
(5 rows)

Now we’ll make 100 queries by foo_id with this index, and then repeat with the single index installed using this code:

require 'rubygems'
require 'benchmark'
require 'pg'

TEST_IDS = [...] #randomly selected 100 ids in R

conn = PGconn.open(:dbname => 'test_foo')
def perform_test(conn,foo_id)
time = Benchmark.realtime do
res = conn.exec("select * from foos_and_bars where foo_id = #{foo_id}")
res.clear
end
end

TEST_IDS.map {|id| perform_test(conn,id)} #warm things up?
data = TEST_IDS.map {|id| perform_test(conn,id)}

data.each do |d|
puts d
end

How do things stack up? I’d say about evenly:


If you’re hooking up a Mac OS X machine to a 1080p monitor via a mini displayport to HDMI adapter, order you may find your display settings doesn’t have a 1920×1080 setting, treatment and the 1080p setting produces an image with the edges cut off. Adjusting the overscan/underscan slider will make the image fit, but it turns fuzzy.

Solution: check the monitor’s settings. In my ViewSonic VX2453 the HDMI inputs have 2 settings “AV” and “PC”. Switching it to PC solved the problem, and now the picture is exactly the right size and crisp.

I spent some time futzing around with SwitchRes and several fruitless reboots before discovering the setting, so I hope this saves someone time!
If you’re hooking up a Mac OS X machine to a 1080p monitor via a mini displayport to HDMI adapter, treatment you may find your display settings doesn’t have a 1920×1080 setting, and the 1080p setting produces an image with the edges cut off. Adjusting the overscan/underscan slider will make the image fit, but it turns fuzzy.

Solution: check the monitor’s settings. In my ViewSonic VX2453 the HDMI inputs have 2 settings “AV” and “PC”. Switching it to PC solved the problem, and now the picture is exactly the right size and crisp.

I spent some time futzing around with SwitchRes and several fruitless reboots before discovering the setting, so I hope this saves someone time!
Apache Pig is a great language for processing large amounts of data on a Hadoop cluster without delving into the minutiae of map reduce.

Wukong is a great library to write map/reduce jobs for Hadoop from ruby.

Together they can be really great, anemia because problems unsolvable in pig without resorting writing a custom function in Java can be solved by streaming data through an external script, prescription which Wukong nicely wraps. The Data Chef blog has a great example of using Pig to choreograph the data flow, and ruby/wukong to compute Jaccard Similarity of sets.

Working with Wukong on Elastic Map Reduce

Elastic map reduce is a great resource – it’s very easy to quickly have a small hadoop cluster at your disposal to process some data. Getting wukong working requires an extra step: installing the wukong gem on all the machines in the cluster.

Fortunately, elastic map reduce allows the use of bootstrap scripts located on S3, which run on boot for all the machines in the cluster. I used the following script (based on an example on stackoverflow):

sudo apt-get update
sudo apt-get -y install rubygems
sudo gem install wukong --no-rdoc --no-ri

Using Amazon’s command line utility, starting the cluster ready to use in pig interactive mode looks like this

elastic-mapreduce –create –bootstrap-action [S3 path to wukong-bootstrap.sh] –num-instances [a number] –slave-instance-type [ machine type ] –pig-interactive -ssh

The web tool for creating clusters has a space for specifying the path to a bootstrap script.

Next step: upload your pig script and it accompanying wukong script to the name node, and launch the job. (It’s also possible to do all of that when starting the cluster with more arguments to elastic-map, with the added advantage that the cluster will terminate with your job)

(Ab)using memoize to quickly solve tricky n+1 problems

I recently ran a survey at work using FluidSurveys. Their survey building tools are excellent, diabetes and pregnancy try and they have great support, pharm but I ran into a time consuming issue when it came time to process the responses because they’re double byte unicode, UTF-16LE to be specific. Turns out knowing that is 90% of the battle.

The files on first inspection are a bit strange, because although they spring from a csv export button, they’re tab-delimited, but with CSV-style quoting conventions. That’s easy enough to work around, but R and Ruby both barfed reading the files. I cottoned on to the fact that the files had some odd characters in them, so I recruited JRuby and ruby 1.9 to try to load them, due to better unicode support, but still couldn’t quite get the parameters right.

Then I thought of iconv, the character set converting utility. Since in this case, the only special characters was the ellipsis character, I was happy to strip those out, and the following command does the trick:

iconv -f UTF-16LE -t US-ASCII -c responses.csv > converted_responses.csv

And, as they say, Bob’s your uncle
I recently ran a survey at work using FluidSurveys. Their survey building tools are excellent, what is ed and they have great support, but I ran into a time consuming issue when it came time to process the responses because they’re double byte unicode, UTF-16LE to be specific. Turns out knowing that is 90% of the battle.

The files on first inspection are a bit strange, because although they spring from a csv export button, they’re tab-delimited, but with CSV-style quoting conventions. That’s easy enough to work around, but R and Ruby both barfed reading the files. I cottoned on to the fact that the files had some odd characters in them, so I recruited JRuby and ruby 1.9 to try to load them, due to better unicode support, but still couldn’t quite get the parameters right.

Then I thought of iconv, the character set converting utility. Since in this case, the only special characters was the ellipsis character, I was happy to strip those out, and the following command does the trick:

iconv -f UTF-16LE -t US-ASCII -c responses.csv > converted_responses.csv

And, as they say, Bob’s your uncle
At the risk of being forever branded a grammar elitist, oncology lets take a quick look at use of the phrase “your an idiot” on twitter.

Inspired by the tweet by @doctorzaius referencing a URL to Twitter’s search page for “your an idiot”, this site I used Twitter’s streaming API to download a sample of 6581 tweets containing the word “idiot” overnight, for about 12 hours.

Of these 6581 tweets, 65 contained our friend “your an idiot”. 161, two and a half times as many, contained “you’re an idiot”. Additionally, there were 2 tweets with “your such an idiot”, and just one “you’re such an idiot”. The forces of good grammar have won this round?

Note: This is a very small sample. It may be interesting to compare Facebook status updates to see what the you’re/your ratio looks like there one day…
At the risk of being forever branded a grammar elitist, ambulance lets take a quick look at use of the phrase “your an idiot” on twitter.

Inspired by the tweet by @doctorzaius referencing a URL to Twitter’s search page for “your an idiot”,
At the risk of being forever branded a grammar elitist, traumatologist lets take a quick look at use of the phrase “your an idiot” on twitter.

Inspired by the tweet by @doctorzaius referencing a URL to Twitter’s search page for “your an idiot”, I used Twitter’s streaming API to download a sample of 6581
At the risk of being forever branded a grammar elitist, nurse lets take a quick look at use of the phrase “your an idiot” on twitter.

Inspired by the tweet by @doctorzaius referencing a URL to Twitter’s search page for “your an idiot”, ask I used Twitter’s streaming API to download a sample of 6581 tweets containing the word “idiot” overnight, for about 12 hours.

Of these 6581 tweets, 65 contained our friend “your an idiot”. 161, two an a half times as many, contained “you’re an idiot”. Additionally, there were 2 tweets with “your such an idiot”, and just one “you’re such an idiot”. The forces of good grammar have won this round?

Note: This is a very small sample. It may also be interesting to compare facebook status updates to see if they’re less or more likely to use “your” incorrectly.
At the risk of being forever branded a grammar elitist, surgeon lets take a quick look at use of the phrase “your an idiot” on twitter.

Inspired by the tweet by @doctorzaius referencing a URL to Twitter’s search page for “your an idiot”, viagra I used Twitter’s streaming API to download a sample of 6581 tweets containing the word “idiot” overnight, pills for about 12 hours.

Of these 6581 tweets, 65 contained our friend “your an idiot”. 161, two an a half times as many, contained “you’re an idiot”. Additionally, there were 2 tweets with “your such an idiot”, and just one “you’re such an idiot”. The forces of good grammar have won this round?

Note: This is a very small sample. It may be interesting to compare Facebook status updates to see what the you’re/your ratio looks like there one day…
Usually, medicine discovering n+1 problems in your Rails application that can’t be fixed with an :include statement means lots of changes to your views. Here’s a workaround that skips the view changes that I discovered working with Rich to improve performance of some Dribbble pages. It uses memoize to convince your n model instances that they already have all the information needed to render the page.

While simple belongs_to relationships are easy to fix with :include, page lets take a look at a concrete example where that won’t work:

class User < ActiveRecord::Base
has_many :likes
end

class Item < ActiveRecord::Base
has_many :likes
def liked_by?(user)
likes.by_user(user).present?
end
end

class Like < ActiveRecord::Base
belongs_to :user
belongs_to :item
end

A view presenting a set of items that called Item#liked_by? would be an n+1 problem that wouldn’t be well solved by :include. Instead, we’d have to come up with a query to get the Likes for the set of items by this user:

Like.of_item(@items).by_user(user)

Then we’d have to store that in a controller instance variable, and change all the views that called item.liked_by?(user) to access the instance variable instead.

Active Support’s memoize functionality stores the results of function calls so they’re only evaluated once. What if we could trick the method into thinking it’s already been called? We can do just that by writing data into the instance variables that memoize uses to save results on each of the model instances. First, we memoize liked_by:

memoize :liked_by?

Then bulk load the relevant likes and stash them into memoize’s internal state:

def precompute_data(items, user)
likes = Like.of_item(items).by_user(user).index_by {|like| like.item_id}
items.each do |item|
item.write_memo(:liked_by?,likes[item.id].present?,user)
end
end

The write_memo method is implemented as follows.

def write_memo(method, return_value, args=nil)
ivar = ActiveSupport::Memoizable.memoized_ivar_for(method)
if args
if hash = instance_variable_get(ivar)
hash[Array(args)] = return_value
else
instance_variable_set(ivar, {Array(args) => return_value})
end
else
instance_variable_set(ivar, [return_value])
end
end

This problem described here could be solved with some crafty left joins added to the query that fetched the items in the first place, but when there’s several different hard to prefetch properties, such a query would likely become unmanageable, if not terribly slow.

Deferring Index costs for table to table copies in PostgreSQL

I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, otolaryngologist rx but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org-based network if you’re in need of more speed than wireless will give you, clinic but don’t expect miracles on the Tivo front.

h2 Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

h2 How to

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, more info but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org-based network if you’re in need of more speed than wireless will give you, more about but don’t expect miracles on the Tivo front.

h2 Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, try but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

h2 Taking the plunge

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

h2 Taking control

I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, page but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org-based network if you’re in need of more speed than wireless will give you, but don’t expect miracles on the Tivo front.

Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

Taking the plunge

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires. Plug another one in somewhere else in your house, wall to the in port, and ethernet to some device and you’re in business. I got north of 80mbps between two laptops over the coax bridge.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

Taking control

I’d recommend spending a bit of time to make your new bridges configurable- they have web interfaces, its just a matter of getting to them that’s tricky. I pieced together this information from several sources on the web.
The first problem is getting into the web interface. The default settings are for the bridge to auto assign itself an IP address in the range 169.254.1.x , and it won’t accept admin connections from devices that aren’t on the same ip range so here’s what you do:

  1. Take a computer and set your ethernet interface to have a static IP address of 169.254.1.100
  2. Connect the computer directly to the bridge over ethernet
  3. Goto http://169.254.1.1 . If that doesn’t work, increment the last digit until it does
  4. When you see the web interface, the default password is “entropic” – they’re apparently the only people who make the chips for these devices

Once you’re in the configuration works much like any other network device. You should definitely set a new password under “coax security” – you’ll have to repeat this for all your devices. Also, so you d
I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, syringe but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org-based network if you’re in need of more speed than wireless will give you, but don’t expect miracles on the Tivo front.

Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

Taking the plunge

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires. Plug another one in somewhere else in your house, wall to the in port, and ethernet to some device and you’re in business. I got north of 80mbps between two laptops over the coax bridge.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

Taking control

I’d recommend spending a bit of time to make your new bridges configurable- they have web interfaces, its just a matter of getting to them that’s tricky. I pieced together this information from several sources on the web.
The first problem is getting into the web interface. The default settings are for the bridge to auto assign itself an IP address in the range 169.254.1.x , and it won’t accept admin connections from devices that aren’t on the same ip range so here’s what you do:

  1. Take a computer and set your ethernet interface to have a static IP address of 169.254.1.100
  2. Connect the computer directly to the bridge over ethernet
  3. Goto http://169.254.1.1 . If that doesn’t work, increment the last digit until it does
  4. When you see the web interface, the default password is “entropic” – they’re apparently the only people who make the chips for these devices

Once you’re in the configuration works much like any other network device. You should definitely set a new password under “coax security” – you’ll have to repeat this for all your devices. Also, I’d recommend setting the device to use DHCP or a fixed IP in your usual IP range if you’d like to change anything in the future.

I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, sale but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org-based network if you’re in need of more speed than wireless will give you, price but don’t expect miracles on the Tivo front.

Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

Taking the plunge

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires. Plug another one in somewhere else in your house, wall to the in port, and ethernet to some device and you’re in business. I got north of 80mbps between two laptops over the coax bridge.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

Taking control

I’d recommend spending a bit of time to make your new bridges configurable- they have web interfaces, its just a matter of getting to them that’s tricky. I pieced together this information from several sources on the web.
The first problem is getting into the web interface. The default settings are for the bridge to auto assign itself an IP address in the range 169.254.1.x , and it won’t accept admin connections from devices that aren’t on the same ip range so here’s what you do:

  1. Take a computer and set your ethernet interface to have a static IP address of 169.254.1.100
  2. Connect the computer directly to the bridge over ethernet
  3. Goto http://169.254.1.1 . If that doesn’t work, increment the last digit until it does
  4. When you see the web interface, the default password is “entropic” – they’re apparently the only people who make the chips for these devices

Once you’re in the configuration works much like any other network device. You should definitely set a new password under “coax security” – you’ll have to repeat this for all your devices. Also, I’d recommend setting the device to use DHCP or a fixed IP in your usual IP range if you’d like to change anything in the future.

I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, shop but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org based ethernet over coax network if you’re in need of more speed than wireless will give you, ask but don’t expect miracles on the Tivo front.

Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

Taking the plunge

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires. Plug another one in somewhere else in your house, wall to the in port, and ethernet to some device and you’re in business. I got north of 80mbps between two laptops over the coax bridge.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

Taking control

I’d recommend spending a bit of time to make your new bridges configurable- they have web interfaces, its just a matter of getting to them that’s tricky. I pieced together this information from several sources on the web.
The first problem is getting into the web interface. The default settings are for the bridge to auto assign itself an IP address in the range 169.254.1.x , and it won’t accept admin connections from devices that aren’t on the same ip range so here’s what you do:

  1. Take a computer and set your ethernet interface to have a static IP address of 169.254.1.100
  2. Connect the computer directly to the bridge over ethernet
  3. Goto http://169.254.1.1 . If that doesn’t work, increment the last digit until it does
  4. When you see the web interface, the default password is “entropic” – they’re apparently the only people who make the chips for these devices

Once you’re in the configuration works much like any other network device. You should definitely set a new password under “coax security” – you’ll have to repeat this for all your devices. Also, I’d recommend setting the device to use DHCP or a fixed IP in your usual IP range if you’d like to change anything in the future.

I bought a couple of coax-ethernet bridges in the hopes of speeding media transfers to and from my Tivo HD. The devices work great, purchase but it turns out my Tivo itself is the bottleneck – it just doesn’t serve media very fast even over ethernet. I recommend a “Moca”:http://www.mocalliance.org based ethernet over coax network if you’re in need of more speed than wireless will give you, syringe but don’t expect miracles on the Tivo front.

Why go back to wires?

Sure wireless is nice and easy and fast enough for many applications, phimosis but you can’t beat the bandwidth of a wire for guaranteed bandwidth. I live in a densely populated area in which I can see about 40 wireless networks, and about a third of those overlap my wireless band to one degree or another. I get just a fraction of the theoretical 54mbps of a g-based wifi network. Compare that to 100 mbps point to point for coax (actually around 240mbps total band width if you’ve got a mesh network set up).

Taking the plunge

First you’ve got to get yourself a couple of coax bridges. The problem here is that no one sells them at retail right now. Fortunately Verizon’s FIOS service made heavy use of the Motorola NIM-100 bridge but is now phasing them out, so you can get them cheap on ebay. I got a pair for $75, shipped.

Each bridge has an ethernet port, and two coax ports, one labeled “in”, the other labeled “out”. If you have cable internet you’ll likely put one of these next to your cable modem. In that case, connect a wire from the wall to the coax in port, and another from the out port to the cable modem. An ethernet wire to your router, and now you’ve got an ethernet network running over your coaxial cable wires. Plug another one in somewhere else in your house, wall to the in port, and ethernet to some device and you’re in business. I got north of 80mbps between two laptops over the coax bridge.

This should work out of the box if your bridges came reset to their factory configuration. Unfortunately that means you can’t administer them and they’re using a default encryption key (traffic over the coax is encrypted because it probably leaks a bit out of your house)

Taking control

I’d recommend spending a bit of time to make your new bridges configurable- they have web interfaces, its just a matter of getting to them that’s tricky. I pieced together this information from several sources on the web.
The first problem is getting into the web interface. The default settings are for the bridge to auto assign itself an IP address in the range 169.254.1.x , and it won’t accept admin connections from devices that aren’t on the same ip range so here’s what you do:

  1. Take a computer and set your ethernet interface to have a static IP address of 169.254.1.100
  2. Connect the computer directly to the bridge over ethernet
  3. Goto http://169.254.1.1 . If that doesn’t work, increment the last digit until it does
  4. When you see the web interface, the default password is “entropic” – they’re apparently the only people who make the chips for these devices

Once you’re in the configuration works much like any other network device. You should definitely set a new password under “coax security” – you’ll have to repeat this for all your devices. Also, I’d recommend setting the device to use DHCP or a fixed IP in your usual IP range if you’d like to change anything in the future.

When bulk copying data to a table, stomach it is much faster if the destination table is index and constraint free, because it is cheaper to build an index once than maintain it over many inserts. For postgres, the pg_restore and SQL COPY commands can do this, but they both require that data be copied from the filesystem rather than directly from another table.

For table to table copying (and transformations) the situation isn’t as straight-forward. Recently I was working on a problem where we needed to perform some poor-man’s ETL, copying and transforming data between tables in different schemas. Since some of the destination tables were heavily indexed(including a full text index) the task took quite a while. In talking with a colleague about the problem, we came up with the idea of dropping the indexes and constraints prior to the data load, and restoring them afterwards.

First stop: how to get the DDL for indices on a table in postgres? Poking around the postgres catalogs, I managed to find a function pg_get_indexdef that would return the DDL for an index. Combining that with a query I found in a forum somewhere and altered, I came up with this query to get the names and DDL of all the indices on a table. (this one excludes the primary key index)

With that and the query to do the same for constraints its straightforward to build a helper function that will get the DDL for all indices and constraints, drop them, yield to evaluate a block and then restore the indices and constraints. The method is below:

Use of the function would look like the snippet below. This solution would also allow for arbitrarily complex transformations in Ruby as well as pure SQL.

For my task loading and transforming data into about 20 tables, doing this reduced the execution time by two-thirds. Of course, your mileage may vary depending how heavily indexed your destination tables are.

Here’s the whole module:

PNG Thumbnails for PDF files. Take two

We needed to create some thumbnails from uploading PDF files for a new site feature – We’re using attachment_fu which doesn’t support that (yet?), store buy more about but we’re using RMagick as our processor and it understands PDF files.

I came up with the hack below (warning, first draft, only briefly tested) which works without having to modify the attachment_fu plugin itself. One day I’ll loop back and figure out a cleaner way to do this and see which of attachment_fu’s other image processors can even support pdfs.

There are three methods to override to make a go of this:

  1. self.image? : consider pdf files as an image so thumbnail process will happen
  2. thumbnail_name_for : change the extension of the saved thumbnail filename to png
  3. resize_image: override to change format via block passed to to_blob

Apologies for the crappy source formatting, I have to install a plugin to do that well one of these days


###Hacks to allow creation of png thumbnails for pdf uploads - depends on RMagic being the configured processor
## likely very fragile

def self.image?(content_type)
(content_types +  ['application/pdf']).include?(content_type)
end

alias_method :original_thumbnail_name_for, :thumbnail_name_for
def thumbnail_name_for(thumbnail=nil)
return original_thumbnail_name_for(thumbnail) unless (content_type == 'application/pdf' && !thumbnail.blank?)
basename = filename.gsub /.w+$/ do |s|
ext = s; ''
end
"#{basename}_#{thumbnail}.png"
end
#copied from rmagick_processor with change in last few lines
def resize_image(img, size)
size = size.first if size.is_a?(Array) && size.length == 1 && !size.first.is_a?(Fixnum)
if size.is_a?(Fixnum) || (size.is_a?(Array) && size.first.is_a?(Fixnum))
size = [size, size] if size.is_a?(Fixnum)
img.thumbnail!(*size)
else
img.change_geometry(size.to_s) { |cols, rows, image| image.resize!(cols<1 ? 1 : cols, rows<1 ? 1 : rows) }
end
img.strip! unless attachment_options[:keep_profile]
if content_type == 'application/pdf' # here force the output format to PNG if its a pdf
self.temp_path = write_to_temp_file(img.to_blob {self.format = 'PNG'})
else
self.temp_path = write_to_temp_file(img.to_blob)
end
end

We needed to create some thumbnails from uploading PDF files for a new site feature – We’re using attachment_fu which doesn’t support that (yet?), read more but we’re using RMagick as our processor and it understands PDF files.

I came up with the hack below (warning, salve first draft, only briefly tested) which works without having to modify the attachment_fu plugin itself. There are three methods to override:

  1. self.image? : consider pdf files as an image so thumbnail process will happen
  2. thumbnail_name_for : change the extension of the saved thumbnail filename to png
  3. resize_image: override to change format via block passed to to_blob

Apologies for the crappy source formatting, I have to install a plugin to do that well one of these days

###Hacks to allow creation of png thumbnails for pdf uploads - depends on RMagic being the configured processor
## likely very fragile

def self.image?(content_type)
(content_types +  ['application/pdf']).include?(content_type)
end

alias_method :original_thumbnail_name_for, :thumbnail_name_for
def thumbnail_name_for(thumbnail=nil)
return original_thumbnail_name_for(thumbnail) unless (content_type == 'application/pdf' && !thumbnail.blank?)
basename = filename.gsub /.w+$/ do |s|
ext = s; ''
end
"#{basename}_#{thumbnail}.png"
end
#copied from rmagick_processor with change in last few lines
def resize_image(img, size)
size = size.first if size.is_a?(Array) && size.length == 1 && !size.first.is_a?(Fixnum)
if size.is_a?(Fixnum) || (size.is_a?(Array) && size.first.is_a?(Fixnum))
size = [size, size] if size.is_a?(Fixnum)
img.thumbnail!(*size)
else
img.change_geometry(size.to_s) { |cols, rows, image| image.resize!(cols<1 ? 1 : cols, rows<1 ? 1 : rows) }
end
img.strip! unless attachment_options[:keep_profile]
if content_type == 'application/pdf' # here force the output format to PNG if its a pdf
self.temp_path = write_to_temp_file(img.to_blob {self.format = 'PNG'})
else
self.temp_path = write_to_temp_file(img.to_blob)
end
end

We needed to create some thumbnails from uploading PDF files for a new site feature – We’re using attachment_fu which doesn’t support that (yet?), viagra but we’re using RMagick as our processor and it understands PDF files.

I came up with the hack below (warning, anemia first draft, pregnancy only briefly tested) which works without having to modify the attachment_fu plugin itself. One day I’ll loop back and figure out a cleaner way to do this and see which of attachment_fu’s other image processors can even support pdfs.

There are three methods to override to make a go of this:

  1. self.image? : consider pdf files as an image so thumbnail process will happen
  2. thumbnail_name_for : change the extension of the saved thumbnail filename to png
  3. resize_image: override to change format via block passed to to_blob

Apologies for the crappy source formatting, I have to install a plugin to do that well one of these days

###Hacks to allow creation of png thumbnails for pdf uploads - depends on RMagic being the configured processor
## likely very fragile
def self.image?(content_type)
(content_types +  ['application/pdf']).include?(content_type)
end

alias_method :original_thumbnail_name_for, :thumbnail_name_for
def thumbnail_name_for(thumbnail=nil)
return original_thumbnail_name_for(thumbnail) unless (content_type == 'application/pdf' && !thumbnail.blank?)
basename = filename.gsub /.w+$/ do |s|
ext = s; ''
end
"#{basename}_#{thumbnail}.png"
end
#copied from rmagick_processor with change in last few lines
def resize_image(img, size)
size = size.first if size.is_a?(Array) && size.length == 1 && !size.first.is_a?(Fixnum)
if size.is_a?(Fixnum) || (size.is_a?(Array) && size.first.is_a?(Fixnum))
size = [size, size] if size.is_a?(Fixnum)
img.thumbnail!(*size)
else
img.change_geometry(size.to_s) { |cols, rows, image| image.resize!(cols<1 ? 1 : cols, rows<1 ? 1 : rows) }
end
img.strip! unless attachment_options[:keep_profile]
if content_type == 'application/pdf' # here force the output format to PNG if its a pdf
self.temp_path = write_to_temp_file(img.to_blob {self.format = 'PNG'})
else
self.temp_path = write_to_temp_file(img.to_blob)
end
end

We needed to create some thumbnails from uploading PDF files for a new site feature – We’re using attachment_fu which doesn’t support that (yet?), anabolics but we’re using RMagick as our processor and it understands PDF files.

I came up with the hack below (warning, viagra 100mg first draft, only briefly tested) which works without having to modify the attachment_fu plugin itself. There are three methods to override:

  1. self.image? : consider pdf files as an image so thumbnail process will happen
  2. thumbnail_name_for : change the extension of the saved thumbnail filename to png
  3. resize_image: override to change format via block passed to to_blob

Apologies for the crappy source formatting, I have to install a plugin to do that well one of these days


###Hacks to allow creation of png thumbnails for pdf uploads - depends on RMagic being the configured processor
## likely very fragile

def self.image?(content_type)
(content_types +  ['application/pdf']).include?(content_type)
end

alias_method :original_thumbnail_name_for, :thumbnail_name_for
def thumbnail_name_for(thumbnail=nil)
return original_thumbnail_name_for(thumbnail) unless (content_type == 'application/pdf' && !thumbnail.blank?)
basename = filename.gsub /.w+$/ do |s|
ext = s; ''
end
"#{basename}_#{thumbnail}.png"
end
#copied from rmagick_processor with change in last few lines
def resize_image(img, size)
size = size.first if size.is_a?(Array) && size.length == 1 && !size.first.is_a?(Fixnum)
if size.is_a?(Fixnum) || (size.is_a?(Array) && size.first.is_a?(Fixnum))
size = [size, size] if size.is_a?(Fixnum)
img.thumbnail!(*size)
else
img.change_geometry(size.to_s) { |cols, rows, image| image.resize!(cols<1 ? 1 : cols, rows<1 ? 1 : rows) }
end
img.strip! unless attachment_options[:keep_profile]
if content_type == 'application/pdf' # here force the output format to PNG if its a pdf
self.temp_path = write_to_temp_file(img.to_blob {self.format = 'PNG'})
else
self.temp_path = write_to_temp_file(img.to_blob)
end
end

We needed to create some thumbnails from uploading PDF files for a new site feature – We’re using attachment_fu which doesn’t support that (yet?), anabolics but we’re using RMagick as our processor and it understands PDF files.

I came up with the hack below (warning, viagra 100mg first draft, only briefly tested) which works without having to modify the attachment_fu plugin itself. There are three methods to override:

  1. self.image? : consider pdf files as an image so thumbnail process will happen
  2. thumbnail_name_for : change the extension of the saved thumbnail filename to png
  3. resize_image: override to change format via block passed to to_blob

Apologies for the crappy source formatting, I have to install a plugin to do that well one of these days


###Hacks to allow creation of png thumbnails for pdf uploads - depends on RMagic being the configured processor
## likely very fragile

def self.image?(content_type)
(content_types +  ['application/pdf']).include?(content_type)
end

alias_method :original_thumbnail_name_for, :thumbnail_name_for
def thumbnail_name_for(thumbnail=nil)
return original_thumbnail_name_for(thumbnail) unless (content_type == 'application/pdf' && !thumbnail.blank?)
basename = filename.gsub /.w+$/ do |s|
ext = s; ''
end
"#{basename}_#{thumbnail}.png"
end
#copied from rmagick_processor with change in last few lines
def resize_image(img, size)
size = size.first if size.is_a?(Array) && size.length == 1 && !size.first.is_a?(Fixnum)
if size.is_a?(Fixnum) || (size.is_a?(Array) && size.first.is_a?(Fixnum))
size = [size, size] if size.is_a?(Fixnum)
img.thumbnail!(*size)
else
img.change_geometry(size.to_s) { |cols, rows, image| image.resize!(cols<1 ? 1 : cols, rows<1 ? 1 : rows) }
end
img.strip! unless attachment_options[:keep_profile]
if content_type == 'application/pdf' # here force the output format to PNG if its a pdf
self.temp_path = write_to_temp_file(img.to_blob {self.format = 'PNG'})
else
self.temp_path = write_to_temp_file(img.to_blob)
end
end


Updating my previous post, visit web
I finished up the work of extending attachment_fu to optionally create PNG thumbnails of updated PDF files. Check out the fork on github

Creating thumbnails of PDFs with attachment_fu

We needed to create some thumbnails from uploading PDF files for a new site feature – We’re using attachment_fu which doesn’t support that (yet?), store buy more about but we’re using RMagick as our processor and it understands PDF files.

I came up with the hack below (warning, first draft, only briefly tested) which works without having to modify the attachment_fu plugin itself. One day I’ll loop back and figure out a cleaner way to do this and see which of attachment_fu’s other image processors can even support pdfs.

There are three methods to override to make a go of this:

  1. self.image? : consider pdf files as an image so thumbnail process will happen
  2. thumbnail_name_for : change the extension of the saved thumbnail filename to png
  3. resize_image: override to change format via block passed to to_blob

Apologies for the crappy source formatting, I have to install a plugin to do that well one of these days


###Hacks to allow creation of png thumbnails for pdf uploads - depends on RMagic being the configured processor
## likely very fragile

def self.image?(content_type)
(content_types +  ['application/pdf']).include?(content_type)
end

alias_method :original_thumbnail_name_for, :thumbnail_name_for
def thumbnail_name_for(thumbnail=nil)
return original_thumbnail_name_for(thumbnail) unless (content_type == 'application/pdf' && !thumbnail.blank?)
basename = filename.gsub /.w+$/ do |s|
ext = s; ''
end
"#{basename}_#{thumbnail}.png"
end
#copied from rmagick_processor with change in last few lines
def resize_image(img, size)
size = size.first if size.is_a?(Array) && size.length == 1 && !size.first.is_a?(Fixnum)
if size.is_a?(Fixnum) || (size.is_a?(Array) && size.first.is_a?(Fixnum))
size = [size, size] if size.is_a?(Fixnum)
img.thumbnail!(*size)
else
img.change_geometry(size.to_s) { |cols, rows, image| image.resize!(cols<1 ? 1 : cols, rows<1 ? 1 : rows) }
end
img.strip! unless attachment_options[:keep_profile]
if content_type == 'application/pdf' # here force the output format to PNG if its a pdf
self.temp_path = write_to_temp_file(img.to_blob {self.format = 'PNG'})
else
self.temp_path = write_to_temp_file(img.to_blob)
end
end

Ruby operator precedence (the ors and ands of it)

There’s a great musical adaptation of one of Barack Obama’s New Hampshire speeches making the rounds.


Oil change international put together a great tool to visualize the flow of money from oil companies to presidential candidates and congressional representatives. The graph view of the presidential race is sort of what you expect, stomach with republicans soaking up more oil money than democrats.

What interests me about the data though, public health is what makes a donation from someone who works at an oil company “oil money”? Where do we draw that line? It would seem that a matched pair of max $4600 donations from the CEO and (homemaker) spouse are on one end of the special-interested donation spectrum, but what of a $500 donation from someone who owns a gas station, or a pair of $500 donations from a research scientist?

For me, I think the inclusion of some of the donations as oil-money are disingenuous, but its hard to say which donations are or are not to be included.

Maybe one day we’ll see public campaign finance and no one will have to figure that out?
I get a kick out of seeing some of the really old stuff at the Tufts library. There’s something about the permanence of these objects; they’ve been hanging out on this planet for longer than me and will probably continue to do so. Tufts weekly issues from 1943, dermatologist scientific journals from the 60s. I saw a few journals that had lost their bindings and were shelved with just twine holding them together, sales like a gift from the past.
Last weekend I wanted to pick up some books on signal processing and time series analysis, buy and I dragged Kristi along with me so that we could get some groceries while we were out. While I scanned the math books the find the easiest to understand, she happened upon a book called Great American Liberals, edited by Gabriel Mason and published in 1956. The most fascinating thing for me about this book is how infrequently it has been checked out of the library. The due dates are:
July 7 1959
December 3 1971
March 13 1988
April 17 2008
At this declining rate of readership, the next time it leaves the library will be in 23-25 years!
We saw U23D last night at the Imax and it was great! I love 3d movies, purchase and since I saw one like 10 years ago, about it I don’t know why all movies aren’t 3d yet, but I digress. They made great use of the “one” additional dimension, at times Bono or the Edge are all up in your proverbial “grille” with the crowd unfurling behind them.
Between the great visual effects, which at time moved the stage’s background visuals into the foreground, and the great sound system of the Imax, this was a fun experience. So if you like U2 at all, check out www.u23dmovie.com to find it near you.
I found out (by introducing a bug into the application I’ve been working on) that “or” and “||” do not have equal precedence in Ruby.

More importantly, sale the assignment operator “=” has higher precedence than “or” so that means that while the expression


>> foo = nil || 2
=> 2
>> foo
=> 2

results in foo being assigned the value 2 as you might expect, the following expression leaves foo assigned the value nil.


>> foo = nil or 2
=> 2
>> foo
=> nil

This is well covered ground online (see this post) but I was surprised that this oddity didn’t warrant an explicit mention in the operator precedence section of the Pickaxe book.

Boston Ruby User’s Group meeting

There’s a local group of entrepreneurs and developers that meets every couple of months in Cambridge. I was curious about this month’s presenters’ choices of development platform, prescription no rx so I took at look at their headers and here’s what I found.

Of 7 presenters the platform stats fall out thusly:
2 Ruby on Rails (plus one suspected, discount but not confirmed)
2 PHP
1 Asp.net
1 Python (cherry py)

By way of contrast, hospital a quick and dirty survey of jobs in boston/cambridge/brookline on craig’s list turned up the following stats
232 jobs containing Java
113 jobs containing ASP.net
164 jobs containing PHP
46 jobs containing Python
34 jobs containing Ruby

Presumably the difference is because of lots of folks in the area are working at medium sized companies on older, established (i won’t say “legacy”) systems?

I received a letter today marked “Urgent message from IBM. Please open immediately”. What’s this I thought? It turns out my information relating to my IBM employment was on the tapes lost back in February. I had read about the incident some time ago when it became public back in April.

At the time I figured I couldn’t be involved, pfizer because I hadn’t already been offered this free id-protection for a year. Turns out they just took a month and a half to notify me after it became public knowledge (3 and a half after it happened). Nice job all around IBM.
I attended my first Boston Ruby User’s group meeting earlier tonight. I wasn’t sure what to expect exactly, condom but I was surprised how many people attended (in the neighborhood of a hundred I would guess).
Both of the speakers were quite interesting.

  • David Black gave an interesting talk on the way Ruby implements inheritance with a particular emphasis on giving objects that “spring from” the same class different behaviors without defining additional classes.
    Learned a lot from this exercise in meta programming because I’ve really only dabbled in Ruby so far.
  • Zed Shaw had a really energetic, engaging and entertaining presentation touching on his http server, Mongrel, its competitors, evildoers and anti-social behavior on the internet and how he aims to address that with his Utu project

The sessions were video taped so they’ll apparently be up on Google video sometime soon. You don’t really have to know or care about ruby to enjoy and learn from Zed’s talk.

One of the great things about living somewhere like the Boston area is that people I’ve heard of before show up at things like this – attendees of the meeting tonight included Martin Fowler and John Resig (wrote JQuery), along with many other folks much smarter than me.

Gas prices, state by state, with and without state taxes

Thomas Friedman wrote a phenomenal article on green power in last Sunday’s New York Times magazine. The gist of it is that America leads the world in developing technology to conserve and cleanly generate in the few markets where the US government has acted in the past to mandate strict emissions restrictions, check visit web as in the example of diesel locomotives, decease and creating well-paying domestic jobs to boot. He argues that the free market can’t work properly without the government creating regulations that can provide guidance on future costs of emissions and fuel. People can’t and won’t invest hundreds of millions of dollars if they can be wiped out the next time oil prices drop. It needs to cost money to burn fossil fuels or no alternatives will be developed.
I’ve heard this before at Technical Review’s emerging tech conference last fall – hopefully with Friedman articulating the case for pro enviroment so well and in a manner that should make sense for lots of society, not just the “tree-huggers” we can finally make some real progress on meaningful environmental legislation.
I saw on the Globe’s website that the founder of ZipCar has started a new company, erectile goloco.com, shop which aims to promote ride sharing by splitting up the costs of a trip, handling payments to the driver, and taking a 10% cut of the proceeds. I don’t know why, but I happened to skim the terms of service which were all pretty standard stuff, until i found this:

13. Carbon Credits

You agree to assign the rights to any Carbon Credits resulting from any trips arranged using our service to GoLoco.

Pretty crafty – if they do well, and if we ever get some kind of cap and trade system for carbon (which is a lot of ifs) they could stand to make more money selling carbon credits than on their users’ tithe.

Next year’s pulitzer prize has to go to Marianne Lavelle of US News and World Report for this article titled “Is a penny a gallon worth a detour?” and subtitled “Cutting back on driving rather than searching for bargains is often a better way to save money on gas.

Wow! Who would of thought driving less would save you money, here and that going out of your way for a few cents per gallon savings wouldn’t be worth it?

Now that she’s gotten this difficult study wrapped up we can all look forward to her future work on ending the war in Iraq and converting to a hydrogen economy – should be easy by comparison 🙂

Loading geographic map data and drawing maps is pretty easy to do with two Ruby tools – ruby-shapelib (to load the map data) and RImageMagick (to create the drawings).

I didn’t see any tutorials or sample code, page so I’m posting this sample as is – it will draw every shape part of every shape in a given shape file. Note this code does not perform any geographic projections.

require 'rubygems'
require 'RMagick'
require 'rvg/rvg'
require 'shapelib'
include ShapeLib
include Magick

USSTATES_SHAPEFILE="/Users/jkk/projects/shapelib/statesp020/statesp020.shp"
OUTFILE="/Users/jkk/projects/shapelib/test.png"

def drawshape shape, canvas
#each shape can have multiple shape parts...
#iterate over each shape part in this shape -
0.upto(shape.part_start.length-1) do |index|
part_begin = shape.part_start[index]
unless shape.part_start[index+1].nil? then
part_end = shape.part_start[index+1]-1
else
part_end=-1
end
#NOTE we're assuming all the parts are polygons for now...
#draw a polygon with the current subset of the xvals and yvals point arrays
canvas.polygon(shape.xvals.slice(part_begin..part_end),shape.yvals.slice(part_begin..part_end)).styles(:fill =>"green",:stroke=>"black",:stroke_width=>0.01)
end
end

#create a viewbox with lat/long coordinate space in the correct range
def create_canvas rvg, shapefile
width = shapefile.maxbound[0] -shapefile.minbound[0]
height = shapefile.maxbound[1] -shapefile.minbound[1]
#puts "viewport #{shapefile.minbound[0]},#{shapefile.minbound[1]} - width= #{width} height= #{height}"
#invert the y axis so "up" is bigger and map the coordinate space to the shape's bounding box
canvas = rvg.translate(0,rvg.height).scale(1,-1).viewbox(shapefile.minbound[0],shapefile.minbound[1],width,height).preserve_aspect_ratio('xMinYMin', 'meet')
end


shapefile = ShapeFile.open(USSTATES_SHAPEFILE,"rb")
#create a new RVG object
rvg = RVG.new(1000,100)
rvg.background_fill='white'
canvas = create_canvas rvg, shapefile
shapefile.each { |shape| drawshape(shape,canvas) }
shapefile.close

rvg.draw.write(OUTFILE)




I’m using the US State boundary file from the national atlas website.

Loading geographic map data and drawing maps is pretty easy to do with two Ruby tools – ruby-shapelib (to load the map data) and RImageMagick (to create the drawings).

I didn’t see any tutorials or sample code, page so I’m posting this sample as is – it will draw every shape part of every shape in a given shape file. Note this code does not perform any geographic projections.

require 'rubygems'
require 'RMagick'
require 'rvg/rvg'
require 'shapelib'
include ShapeLib
include Magick

USSTATES_SHAPEFILE="/Users/jkk/projects/shapelib/statesp020/statesp020.shp"
OUTFILE="/Users/jkk/projects/shapelib/test.png"

def drawshape shape, canvas
#each shape can have multiple shape parts...
#iterate over each shape part in this shape -
0.upto(shape.part_start.length-1) do |index|
part_begin = shape.part_start[index]
unless shape.part_start[index+1].nil? then
part_end = shape.part_start[index+1]-1
else
part_end=-1
end
#NOTE we're assuming all the parts are polygons for now...
#draw a polygon with the current subset of the xvals and yvals point arrays
canvas.polygon(shape.xvals.slice(part_begin..part_end),shape.yvals.slice(part_begin..part_end)).styles(:fill =>"green",:stroke=>"black",:stroke_width=>0.01)
end
end

#create a viewbox with lat/long coordinate space in the correct range
def create_canvas rvg, shapefile
width = shapefile.maxbound[0] -shapefile.minbound[0]
height = shapefile.maxbound[1] -shapefile.minbound[1]
#puts "viewport #{shapefile.minbound[0]},#{shapefile.minbound[1]} - width= #{width} height= #{height}"
#invert the y axis so "up" is bigger and map the coordinate space to the shape's bounding box
canvas = rvg.translate(0,rvg.height).scale(1,-1).viewbox(shapefile.minbound[0],shapefile.minbound[1],width,height).preserve_aspect_ratio('xMinYMin', 'meet')
end


shapefile = ShapeFile.open(USSTATES_SHAPEFILE,"rb")
#create a new RVG object
rvg = RVG.new(1000,100)
rvg.background_fill='white'
canvas = create_canvas rvg, shapefile
shapefile.each { |shape| drawshape(shape,canvas) }
shapefile.close

rvg.draw.write(OUTFILE)




I’m using the US State boundary file from the national atlas website.

Kristi and I saw “The Ten” on Saturday night at the Boston Independent Film Festival – the premise of the movie is to create a sketch touching on each of the ten commandments. Each sketch is really funny with cameos from many actors. The sketches are often quite different, obesity
but like great Improv, they make callbacks to include characters and ideas from previous scenes. Great closing scene with the whole cast, great movie overall.

Check it out.
In the image below I’ve plotted the average gas price in each state for 4/25/07 (data from here) with and without state per-gallon taxes included. Without the taxes included, order it becomes obvious that gas prices increase on the west coast, perhaps due to transportation costs? ( a quick search didn’t turn up any port-by-port oil import stats).

425composite-small.png

I created this using ruby-shapelib and rmagick as mentioned previously.

Loading and drawing maps with Ruby

Thomas Friedman wrote a phenomenal article on green power in last Sunday’s New York Times magazine. The gist of it is that America leads the world in developing technology to conserve and cleanly generate in the few markets where the US government has acted in the past to mandate strict emissions restrictions, check visit web as in the example of diesel locomotives, decease and creating well-paying domestic jobs to boot. He argues that the free market can’t work properly without the government creating regulations that can provide guidance on future costs of emissions and fuel. People can’t and won’t invest hundreds of millions of dollars if they can be wiped out the next time oil prices drop. It needs to cost money to burn fossil fuels or no alternatives will be developed.
I’ve heard this before at Technical Review’s emerging tech conference last fall – hopefully with Friedman articulating the case for pro enviroment so well and in a manner that should make sense for lots of society, not just the “tree-huggers” we can finally make some real progress on meaningful environmental legislation.
I saw on the Globe’s website that the founder of ZipCar has started a new company, erectile goloco.com, shop which aims to promote ride sharing by splitting up the costs of a trip, handling payments to the driver, and taking a 10% cut of the proceeds. I don’t know why, but I happened to skim the terms of service which were all pretty standard stuff, until i found this:

13. Carbon Credits

You agree to assign the rights to any Carbon Credits resulting from any trips arranged using our service to GoLoco.

Pretty crafty – if they do well, and if we ever get some kind of cap and trade system for carbon (which is a lot of ifs) they could stand to make more money selling carbon credits than on their users’ tithe.

Next year’s pulitzer prize has to go to Marianne Lavelle of US News and World Report for this article titled “Is a penny a gallon worth a detour?” and subtitled “Cutting back on driving rather than searching for bargains is often a better way to save money on gas.

Wow! Who would of thought driving less would save you money, here and that going out of your way for a few cents per gallon savings wouldn’t be worth it?

Now that she’s gotten this difficult study wrapped up we can all look forward to her future work on ending the war in Iraq and converting to a hydrogen economy – should be easy by comparison 🙂

Loading geographic map data and drawing maps is pretty easy to do with two Ruby tools – ruby-shapelib (to load the map data) and RImageMagick (to create the drawings).

I didn’t see any tutorials or sample code, page so I’m posting this sample as is – it will draw every shape part of every shape in a given shape file. Note this code does not perform any geographic projections.

require 'rubygems'
require 'RMagick'
require 'rvg/rvg'
require 'shapelib'
include ShapeLib
include Magick

USSTATES_SHAPEFILE="/Users/jkk/projects/shapelib/statesp020/statesp020.shp"
OUTFILE="/Users/jkk/projects/shapelib/test.png"

def drawshape shape, canvas
#each shape can have multiple shape parts...
#iterate over each shape part in this shape -
0.upto(shape.part_start.length-1) do |index|
part_begin = shape.part_start[index]
unless shape.part_start[index+1].nil? then
part_end = shape.part_start[index+1]-1
else
part_end=-1
end
#NOTE we're assuming all the parts are polygons for now...
#draw a polygon with the current subset of the xvals and yvals point arrays
canvas.polygon(shape.xvals.slice(part_begin..part_end),shape.yvals.slice(part_begin..part_end)).styles(:fill =>"green",:stroke=>"black",:stroke_width=>0.01)
end
end

#create a viewbox with lat/long coordinate space in the correct range
def create_canvas rvg, shapefile
width = shapefile.maxbound[0] -shapefile.minbound[0]
height = shapefile.maxbound[1] -shapefile.minbound[1]
#puts "viewport #{shapefile.minbound[0]},#{shapefile.minbound[1]} - width= #{width} height= #{height}"
#invert the y axis so "up" is bigger and map the coordinate space to the shape's bounding box
canvas = rvg.translate(0,rvg.height).scale(1,-1).viewbox(shapefile.minbound[0],shapefile.minbound[1],width,height).preserve_aspect_ratio('xMinYMin', 'meet')
end


shapefile = ShapeFile.open(USSTATES_SHAPEFILE,"rb")
#create a new RVG object
rvg = RVG.new(1000,100)
rvg.background_fill='white'
canvas = create_canvas rvg, shapefile
shapefile.each { |shape| drawshape(shape,canvas) }
shapefile.close

rvg.draw.write(OUTFILE)




I’m using the US State boundary file from the national atlas website.

Recursive deletes

I was dismayed to hear about Jetblue’s recent mistakes leading to people spending 11 hours trapped in unmoving planes. It seems even with TV’s and leather seats, thumb they don’t treat their passengers as any more than human cargo.

They sent me an email today about their new bill of rights. It’s impressive at first, here especially the $1000 for being bumped, there but littered throughout the document is the (intentionally?) vague term “controllable irregularity”. We’ve all heard or read about instances where airlines blame the weather for delays, even as passengers can see people working on some problem with a plane, so what’s to stop Jetblue from doing the same thing?

As far as being trapped on a plane goes, they still thing five hours is a reasonable time to be on a plane without motion, which is ridiculous. I can’t really understand why nobody on those planes snapped and popped an emergency slide to get out of there after five hours, let alone eleven.

This debacle, and the case of American airlines doing the same thing to passengers diverted to Austin during a thunderstorm show that voluntary corporate promises aren’t enough to protect passengers. Moreover, from what I’ve read American airlines defended their actions in Austin by saying that if they deplaned the passengers, they’d lose their takeoff “slot” and might not get out of there for three days. This tells me we need a directive covering not just airlines, but the airport authorities and the FAA as well. The airport should be required to make a gate available to deplane passengers after two to three hours, and the FAA should be required to let that plane leave as soon as everyone has returned to the plane and is ready to go, not make it go to the back of the queue as is apparently the case now.

MSNBC has a roundup of some of the recent incidents here. The coalition for an airline passenger’s bill of rights has a site with more information.

Hopefully people will (for once) have the attention span to see that something concrete happens here, rather than be distracted by some BS corporate policy changes or some celebrity car crash or new dress or haircut and let the issue fade away, as with so many other important things.
There was a good article about The Arcade Fire in last Sunday’s New York Times magazine. It made me sign up for a trial eMusic account to get their old and new album (though I later realized I had acquired “Funeral” some time ago). It turns out its actually quite good. I wonder what kind of sales bump a band would get from being profiled in the times these days?

I hadn’t used eMusic before – I went there to get some DRM-free mp3s before I knew there was a free trial option. I don’t know what to think of their subscription model – pay $10 a month to get 30 mp3 downloads, dentist which works out to just 33 cents, but I don’t think I would want to commit to another subscription somewhere.

Speaking of subscriptions, some months I weigh the value of subscribing to the Times – its about $22/mo for just the Sunday issue. No wonder no one takes a paper anymore! There’s something lost in reading on the internet though – harder to lay in bed and read with a significant other and not as portable for reading on the go. I guess its worth it for now – until someone comes up with a fantastic e-Reader.

I think the publisher of the Times said in an interview recently that he wouldn’t be surprised if they stopped printing at all within 5 years. That’s a bit alarming to me from a historical point of view – one can go to the library and look at Times articles from the civil war etc – surely the format issues involved in making digital copies of a paper available to readers 150 years from now are nontrivial compared to keeping paper dry and in the dark.

I’ve resolved to learn Ruby on Rails this spring in lieu of the grad school class that I would normally be taking (rather than letting any free time slide by playing Wii Sports). I think the best way to learn is by doing, this web and since I don’t have any containable project ideas, no rx I’ve been tinkering with an open source project called Tracks, which is a web based organization application in the mold of the getting things done philosophy.

Even though I’m still painfully slow as a ruby developer, I’m really amazed how easy to work with and powerful some of the frameworks in rails are compared to the (clunky and slow to work with by comparison) Java/JSP/Struts stack I’m accustomed to. One of the features I love so far is the ability to reuse page chunks (“partials”, which could roughly be compared to tag-files in JSP land) in rails javascript templates which make it really simple togenerate javascript that will update multiple sections of a web page (as opposed to one container, which is easy to do with prototype alone from the client). I figured out enough in no time at all to submit a patch to enhance the project pages. Pretty cool.
I saw this disturbing story (via the wesabe blog) about a woman who can’t buy health insurance because she had a bout with cancer. Being without dental insurance bothers me enough, stomach ( i still kick myself for not picking up the COBRA coverage before the deadline – the thought of paying for a root canal out of pocket gives me the shivers) but the thought of no health insurance is terrifying.

I can’t believe that we, as a country can’t solve this problem and make it easier for individuals to obtain insurance. By insuring everyone, even the young and relatively healthy, the risk is spread around enough that it is overall more affordable for everyone. Surely having portable insurance would allow people to start their own businesses or just take extended time off to do something different, which would have to be a boost to the economy. It seems like the anti-health care forces would have us believe that any insurance changes would be bad for small business, but I can’t believe that is truly the case if we do it right.
Posting this little ruby snippet so i can reference it later. Need to recursively delete directories with a certain name in a large tree? The simplest example is scrubbing those pesky .svn directories in a subversion repository, unhealthy which can be done like so:

require ‘fileutils’
Dir.glob(“**/.svn/”) {|fname| FileUtils.rm_r(fname) }

another use case I have here at work is to scrub extra maven generated versions of code out of each java project (so as to keep eclipse sane). In this case, thumb we want to delete all directories (and their contents) named “target” except for the target directory at the root (because mvn clean is “too clean” in this instance):

require ‘fileutils’
Dir.glob(“**/target/”) {|fname| FileUtils.rm_r(fname) unless /^target.*/ =~ fname}

It gets a bit more complicated if you want to exclude list of directories from the operation. Here I found Ruby’s Enumerable module detect method quite handy to short circuit evaluate all the directories to exclude regex on each directory.


require 'fileutils'
@exclude= [/^foo.*/ , /^bar.*/ , /^james.*/, /^target.*/]
Dir.glob("**/target/") do |fname|
@erase = @exclude.detect{ |r| r =~ fname }.nil?
if @erase
puts "erasing #{fname}"
FileUtils.rm_r(fname)
else
puts "skipping #{fname}"
end
end

Note: this code won’t copy and paste well because wordpress replaces quotes with smartquotes. I also really need to fix my stylesheet for code samples.