Recursive deletes

Posting this little ruby snippet so i can reference it later. Need to recursively delete directories with a certain name in a large tree? The simplest example is scrubbing those pesky .svn directories in a subversion repository, which can be done like so:

require ‘fileutils’
Dir.glob(“**/.svn/”) {|fname| FileUtils.rm_r(fname) }

another use case I have here at work is to scrub extra maven generated versions of code out of each java project (so as to keep eclipse sane). In this case, we want to delete all directories (and their contents) named “target” except for the target directory at the root (because mvn clean is “too clean” in this instance):

require ‘fileutils’
Dir.glob(“**/target/”) {|fname| FileUtils.rm_r(fname) unless /^target.*/ =~ fname}

It gets a bit more complicated if you want to exclude list of directories from the operation. Here I found Ruby’s Enumerable module detect method quite handy to short circuit evaluate all the directories to exclude regex on each directory.

require 'fileutils'
@exclude= [/^foo.*/ , /^bar.*/ , /^james.*/, /^target.*/]
Dir.glob("**/target/") do |fname|
@erase = @exclude.detect{ |r| r =~ fname }.nil?
if @erase
puts "erasing #{fname}"
puts "skipping #{fname}"

Note: this code won’t copy and paste well because wordpress replaces quotes with smartquotes. I also really need to fix my stylesheet for code samples.

Rails Development – Tracks

I’ve resolved to learn Ruby on Rails this spring in lieu of the grad school class that I would normally be taking (rather than letting any free time slide by playing Wii Sports). I think the best way to learn is by doing, and since I don’t have any containable project ideas, I’ve been tinkering with an open source project called Tracks, which is a web based organization application in the mold of the getting things done philosophy.

Even though I’m still painfully slow as a ruby developer, I’m really amazed how easy to work with and powerful some of the frameworks in rails are compared to the (clunky and slow to work with by comparison) Java/JSP/Struts stack I’m accustomed to. One of the features I love so far is the ability to reuse page chunks (“partials”, which could roughly be compared to tag-files in JSP land) in rails javascript templates which make it really simple togenerate javascript that will update multiple sections of a web page (as opposed to one container, which is easy to do with prototype alone from the client). I figured out enough in no time at all to submit a patch to enhance the project pages. Pretty cool.

Cingular phones rss feed

I’ve been waiting months for Cingular to release the Nokia N75; it is a bit annoying checking their product page over and over again so I’ve been thinking about creating an RSS feed for their offerings for some time. Now its done – and here it is: Cingular RSS Feed. There’s a yaml data file here too. Updated nightly.

Now we can easily watch as Cingular keeps adding crappy RAZRs in assorted colors instead of actually adding new phones.

I tried a number of ruby and python screen scraping utilities along the way, ultimately I’ve been quite pleased with Hpricot, so if you’re doing some scraping and can use Ruby, I’d give that a whirl.

Late to two-way sync party…

I was disappointed this morning to see on digg that someone is already writing software to do bidirectional syncing between google calendar and icalendar. I have been preparing to write such a beast by familiarizing myself with the syncservices API, all the while wondering why no one had done it yet. SyncServices makes it surprisingly easy to do, and that’s been around since Tiger came out over a year ago.

There was clearly a pent up demand for a tool like that, lots of blog posts and comments on the topic here and there – definately something that would fetch a token 10 or 20 bucks for use. I was wavering between a free/open source model and doing a a for-pay client (which would probably require lawyering and accounting) so now that there’s competition, if I proceed it’ll definately be the former. That’s only fair really since I wouldn’t have done it at all if PyObjC wasn’t free.

At least now there’s no real rush to beat some unknown competitor. I can go back to learning ruby on rails instead as originally planned.

SyncServices Example Code in Python

I’ve continued my earlier efforts to learn the SyncServices API and their use from Python using PyObjC and am pleased to share this example script. (I’ve also submitted it to the PyObjC project). The example code interacts with Apple’s Simple Stickies example. The script performs a full “truth” retrieval, registers an alert handler to join sync sessions, adds a new sticky, then waits to handle any sync events.

Now that’s done I can use it as a basis to explore calendar syncing…

Some experiments with PyObjC and the Mac OSX SyncServices engine

I haven’t developed any software that interacts with OSX before the last couple of days. I have to say the experience has been interesting. I’m really impressed with the usability of the interface builder as well as the power of the .NIB file. I hadn’t realized it was much more than just a description of the application layout.

The main reason I’ve never ventured into programming for OSX is Objective-c. Don’t know it, not sure if I want to know more than I’ve learned in the last couple of days. I had an idea of a project to leverage the SyncServices engine though – so I took the plunge. Into PyObjC that is. (I would have liked to use RubyCocoa but it doesn’t look nearly as fully baked).

Progress was slow at first; I had to at least learn to read Objective-C so that I could understand the docs and the example sync applications. Now that I’ve figured out some of the issues I’ve encountered I’m much more confident – if nothing else now I know what I don’t know. I have to say I’m really impressed with the power of PyObjc. It’s been really great for interactively groping my way through the SyncServices apis.

My first task was to get a feel for the apis by doing a read-only (pull) sync of the stickies saved in Apple’s Stickie’s Example. The code that does that is here. There’s currently no sample python code for the SyncServices module available, so I should hand this off to the pyobjc folks (If they’ll take my painfully un-idiomatic python) once I flesh it out some more.

Working on an anti-pattern

The project I’m working on right now is a collection of anti-patterns and just plain terrible code. The upshot of this is that its really hard to make it worse, and often times I can walk away feeling good about making a huge difference in making even small changes. My first project appeared well designed, and since I was new then, I felt very constrained in how much I could change. Not anymore, its like the wild-west in this code base, and any design is better than no design. Its definately been a good way to bust out of my years-long productivity slump.

The project was was started several years ago by an offshore contracting company (it seems like they got paid by the line) and then picked up by an in-house but still offshore team to continue to maintain and build. I don’t want to paint all offshore software-industry workers with the same brush, but in this case the code all appears to be written by people who just know how to program in Java. Barely. They just don’t think like computer scientists. For some reason no one seems to think a single class having five methods to do the same thing is bad. Or methods that are hundreds of lines long. Or building strings by concatenation, multiple times in loops that run thousands of times. Or checking for duplicates when copying the keys of a Map into a list. Converting Longs to Integers via a string object.

I’ve speculated that the current team must have come from a background of sustaining enginering (where the idea is to fix bugs in the least intrusive way possible) and that’s why they blithely copy the bad code around them. Either that or for some reason they don’t feel empowered to make changes.

The last few days have been especially great. I’ve been working on performance problems, and the code is so badly written that there are huge chunks of fat to chop out. Two methods I’ve found are O(n^2) so the profiler practically slaps you in the face, but they’ve somehow sat there for years. Replacing them with real implementations has reduced the time to run this code by 90%. Hours to minutes for large sets. That’s fun to talk about in meetings.

Ode to unit tests and course-grained objects

I recently became a believer in the value of unit testing. I sort of understood the benefits of the test-first approach academically, but always felt that what I was writing would just be too difficult to unit test (UI stuff etc). Before this, ed I think the last thing I unit tested in a formal, framework using way was this directed acyclic graph thing I knocked out for workplace. This project dropped the prospect of taking in a pile of data and at the end spitting out some numbers, which would be easy to test, but that’s not what got me there.

The former code that did similar functionality (ok, really the entire project) is a tangled mess of collections of fine-grained value objects that mostly map to rows in tables. So a given section of code may be juggling three arrays and repeatedly iterating (where did i put that one?) to find objects that relate to objects in each of the other arrays. This is obviously inefficient and very hard to read (especially when the objects passed around have fields that are overloaded, their values depending on the computation phase as they are passed around in a disgustingly procedural manner), so I proposed taming the colections with more intelligent, course-grained objects that would serve as indexes of sorts. Now, instead of looking through the array (or several) to find the sales for item X at store Y the code could just ask an object “get me the data for item X at store Y”. Even better if there’s some kind of normalization or reduction of the data, it can be done as the object is filled with the value objects. So this presents a perfect unit testing specimen, because I can fake up some value objects, toss them in and then test what comes out, with and without manipulation.

None of that is rocket science, but it does allow two things: significant complexity can be hidden and thoroughly tested without hooking it up to an app server (fixing something in a JUnit harness is much faster than redeploying to the app server), and secondly that the code to actually do the final calculations can be astonishingly easy to write, read and integrate into the application. Indeed, in this case, the code slipped into place with only one bug.

From the outside, it seems like a lot of extra work to mock out supporting objects and write tests, but having done it it I feel like it saved me several days of total development time. The end product is better too. I’m not sure I can get behind actually writing tests first, but writing tests (in parallel in my case) definately helps to crystalize my thinking about what my objects actually need to do.

US Centers of Population on a Google Map

I found a document (pdf) on the US Census Bureau site with data on the mean and median population centers of the United States over time. Unfortunately they didn’t stick it on a map so one can get a real sense of the slow south west migration. Fortunately this provided me with a good excuse to learn the google maps API and put the points on a map.

I’ve put the maps up here. I’m not sure I expected the median and mean to be so similar (though unfortunately there’s a hundred years less data for median). The mean ends up further west than the median because the distance of the west coast amplifies the weight of its population.

I was actually looking around the US Census site looking for some interesting data to build a Treemap with, but didn’t find anything yet with a two level hierarchy that didn’t end up at the county level. Got any ideas of something that could be interesting in treemap form?

An interactive line-chart without plugins

I wrote a quick and dirty line chart that is interactive without using flash or any other plugin – it uses Microsoft’s Vector Markup Language (VML) along with javascript. Although I feel dirty for doing something Microsoft specific, anabolics I’m planning to build it into a demo for work (where the apps are alarmingly short of data visualization), and the web-applications are IE-only due to activex plugins.

The example is here.

The chart could be improved in many ways, such as a fill under the line, some markers to indicate the original position of the chart as it is edited, larger click targets for the vertexes, some animations on mouseover. But since I’m not sure any of that will ever happen, I’m putting it up in case someone else can learn from it.

With Canvas and SVG support making it into the the other browsers, VML in IE, and a graphics api (like dojo.gfx/dojo.2d) that will hide the differences, it looks like there could be quite a bit of client-size rendering coming to the web in the not too distant future. (google maps already uses vml as a fall back from transparent pngs for drawing the route in IE)

Incidentally I did most of the work using Aptana which is a quite capable Javascript editor.