Finding the "I'm feeling Ducky" URL for a search term

Problem definition:
A textual list of words, in my case, names of supermarkets such as Walmart, Food Lion, Shoprite etc and a Ruby on Rails web application with a model for the store names. 
The table looks like this:
Table: StoreChain
| name:string | url:string |

The goal is to populate this table with the store names and then for each store, find its web site. 
Since there are more than a hundred of those, I want to do this automatically, at at least do most of the work automatically. 
The Solution:
Using rails console (and since I'm on heroku, it's heroku console) to write a short script that would parse the text file (it's actually a json file) and use a search engine to find the best bet for the web site. The assumption is that for Walmart the web site www.walmart.com would be the first search result. Later I'll just review them manually but the script gives me a nice head start. 
Initially I wanted to use google's feeling lucky feature but I found out google wasn't that API friendly anymore. 
So the next step was to use http://ddg.gg better known as http://DuckDuckGo.com
ddg has a nice json API and it's open...
So here's the script. There's one version for a regular rails console and another for heroku. Heroku's console doesn't like newlines in the "each" block so I added lots of ; and used an inline block but those two versions are conceptually the same. 
Gist: 

pulling up the up to date remotes for hector

In svn you just need to type
$ svn up
And your working copy is updated.
Git has more power and is also more complex at this, so this is how you update your working copy (and you need to do this every day before writing new code and before every commit)
1. Make sure it's clean, if need be stash some files
2. git co master
3. git pull origin master
4. redo 2-3 for every other branch.

As a convenience I have this in my bashrc:

alias hector-pull-all="git co 0.7.0 && git pull --rebase origin 0.7.0 && git co 0.6.0 && git pull --rebase origin 0.6.0 && git co master && git pull --rebase origin master"

So now I simply cd to dev/cassandra/hector/ and type hector-pull-all

Hector stats

Hector on github (https://github.com/rantav/hector) has:
197 followers 
44 forks
82 issue reports (open and closed)
761 commits
185 users on the mailing list (http://groups.google.com/group/hector-users)
9 committers and contributors (actually more, but some were missed), the number the the left of the names are number of commits per committer
$ git shortlog -s
32  Arin Sarkissian
1  B. Todd Burruss
30  Bozhidar Bozhanov
1  Chris Dean
27  Ed Anuff
1  Jim Ancona
1  Qiegang Long
445  Ran Tavory
2  unknown
221  zznate

The easy way to add a build timestamp to your maven artifact

Since Maven 2.1, you can use build timestamp as property:

${maven.build.timestamp}

and configure the format:

<maven.build.timestamp.format>yyyyMMdd-HHmm</maven.build.timestamp.format>

So you can have:

  <properties>
    <maven.build.timestamp.format>yyyy-MM-dd-HH:mm:ss</maven.build.timestamp.format>
  </properties>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <configuration>
          <archive>
            <manifestEntries>
              <timestamp>${maven.build.timestamp}</timestamp>
            </manifestEntries>
          </archive>

        </configuration>
      </plugin>

running maven goals on demand

When working on a multi module maven project and adding avor idl compiler support I found that within the ~30 modules I have, only a few of them use avro. But I wanted to define the build plugin and its dependencies only once, in the parent pom and not every time again in the chile poms.
So to enable avro compiler on demand what I did is create an avro profile in the parent pom and enable it only if the current (usually one of the child) project has a src/main/avro directory.

Here's how the code looks like:

    <!-- AVRO -->
    <profile>
      <id>avro</id>
      <activation>
        <file>
          <exists>src/main/avro</exists>
        </file>
      </activation>
      <build>
        <plugins>
          <plugin>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-maven-plugin</artifactId>
            <version>0.1</version>
            <executions>
              <execution>
                <phase>generate-sources</phase>
                <goals>
                  <goal>protocol</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>com.thoughtworks.paranamer</groupId>
            <artifactId>paranamer-maven-plugin</artifactId>
            <executions>
              <execution>
                <configuration>
                  <sourceDirectory>${project.build.directory}/generated-sources/avro</sourceDirectory>
                  <outputDirectory>${project.build.directory}/classes</outputDirectory>
                </configuration>     
                <goals>
                  <goal>generate</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
      <dependencies>
        <dependency>
          <groupId>org.apache.hadoop</groupId>
            <artifactId>avro</artifactId>
            <version>1.3.3</version>
        </dependency>
      </dependencies>
    </profile>
    <!-- /AVRO -->

Just some useful git magic

Create new local branch and push it to a new remote branch:
git branch 0.6.0
git co 0.6.0
git push origin 0.6.0

Merge from a fork phatduckk/master:
git remote add phatduckk git://github.com/phatduckk/hector.git
git fetch phatduckk
git merge phatduckk/master

Checkout and update a branch:
$ git co 0.6.0 && git pull origin 0.6.0
All at once:
git co master && git pull origin master && git co 0.5.1 && git pull origin 0.5.1 && git co 0.5.0 && git pull origin 0.5.0 && git co 0.6.0 && git pull origin 0.6.0

installing thrift on osx

Download source:

Specific revision: 

Check your current revision:
$ svn info
Path: .
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 917130
Node Kind: directory
Schedule: normal
Last Changed Author: bryanduxbury
Last Changed Rev: 917130
Last Changed Date: 2010-02-28 07:19:38 +0200 (Sun, 28 Feb 2010)

If already checked out but need a different revision:
$ svn up -r 12345

Configure, compile, install: (you need macports for this)
$ sh bootstrap.sh
$ ./configure --with-boost=/opt/local --with-libevent=/opt/local --prefix=/opt/local
$ sudo make install

To see the thrift version: 

$ thrift -version

git cherry-pick

How to cherry-pick changes from one branch to another. On branch master there are 3 changes that need to be merged to branch 0.6.0.

~/dev/cassandra/hector $ git co master
Already on 'master'

$ git log --stat
commit 561a1cd883346b6b6726ad3a4a5199cf696f2b99
Author: zznate 
Date:   Wed Mar 31 17:03:28 2010 -0700

    updated javadoc example for cassandraHostConfigurator

 .../cassandra/service/CassandraClientPool.java     |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

commit 3a8b6bcae850a1ccc61e5fd1dca660fc361b6e0e
Author: zznate 
Date:   Wed Mar 31 13:19:32 2010 -0700

    refactorings for new CassandraHost and related configurator classes

 .../cassandra/service/CassandraClientFactory.java  |   30 ++++---
 .../service/CassandraClientMonitorMBean.java       |    2 +-
 .../service/CassandraClientPoolByHost.java         |    7 --
 .../service/CassandraClientPoolByHostImpl.java     |   44 ++++-----
 .../service/CassandraClientPoolFactory.java        |    4 +-
 .../cassandra/service/CassandraClientPoolImpl.java |   99 +++----------------
 .../service/CassandraClientPoolByHostTest.java     |   12 ++-
 .../cassandra/service/CassandraClientPoolTest.java |   14 ++-
 .../cassandra/service/CassandraClientTest.java     |    2 +-
 .../cassandra/service/KeyspaceTest.java            |    2 +-
 src/test/resources/cassandra-context-test.xml      |   10 +-
 11 files changed, 78 insertions(+), 148 deletions(-)

commit 23306e9a55787f706f64eae642ed962764222b07
Author: zznate 
Date:   Wed Mar 31 13:18:27 2010 -0700

    extracted exhausted policy enum, encapsulate details of a host into a top-level object

 .../cassandra/service/CassandraHost.java           |  171 ++++++++++++++++++++
 .../service/CassandraHostConfigurator.java         |   58 +++++++
 .../cassandra/service/ExhaustedPolicy.java         |    8 +
 .../service/CassandraHostConfiguratorTest.java     |   48 ++++++
 4 files changed, 285 insertions(+), 0 deletions(-)

~/dev/cassandra/hector $ git co 0.6.0
Switched to branch '0.6.0'

~/dev/cassandra/hector $ git cherry-pick -xs 23306e9a55787f706f64eae642ed962764222b07
Finished one cherry-pick.
[0.6.0 8690dbe] extracted exhausted policy enum, encapsulate details of a host into a top-level object (cherry picked from commit 23306e9a55787f706f64eae642ed962764222b07)
 4 files changed, 285 insertions(+), 0 deletions(-)
 create mode 100644 src/main/java/me/prettyprint/cassandra/service/CassandraHost.java
 create mode 100644 src/main/java/me/prettyprint/cassandra/service/CassandraHostConfigurator.java
 create mode 100644 src/main/java/me/prettyprint/cassandra/service/ExhaustedPolicy.java
 create mode 100644 src/test/java/me/prettyprint/cassandra/service/CassandraHostConfiguratorTest.java
~/dev/cassandra/hector $ git log
commit 8690dbe7a131f7adfa1272561ee9904f83112a9a
Author: zznate 
Date:   Wed Mar 31 13:18:27 2010 -0700

    extracted exhausted policy enum, encapsulate details of a host into a top-level object
    (cherry picked from commit 23306e9a55787f706f64eae642ed962764222b07)

    Signed-off-by: Ran Tavory

commit c339e439d643da662e1f3792a88439f4877ee90c
Author: Arin Sarkissian 
Date:   Wed Mar 31 20:20:55 2010 -0700

    bring hector up to date with cassandra 0.6rc1

~/dev/cassandra/hector $ git cherry-pick -xs 3a8b6bcae850a1ccc61e5fd1dca660fc361b6e0e
Automatic cherry-pick failed.  After resolving the conflicts,
mark the corrected paths with 'git add <paths>' or 'git rm <paths>' and commit the result.
When commiting, use the option '-c 3a8b6bc' to retain authorship and message.
~/dev/cassandra/hector $ git st
src/main/java/me/prettyprint/cassandra/service/CassandraClientMonitorMBean.java: needs merge
src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java: needs merge
# On branch 0.6.0
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
# modified:   src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java
# modified:   src/main/java/me/prettyprint/cassandra/service/CassandraClientPoolByHost.java
# modified:   src/main/java/me/prettyprint/cassandra/service/CassandraClientPoolByHostImpl.java
# modified:   src/main/java/me/prettyprint/cassandra/service/CassandraClientPoolFactory.java
# modified:   src/main/java/me/prettyprint/cassandra/service/CassandraClientPoolImpl.java
# modified:   src/test/java/me/prettyprint/cassandra/service/CassandraClientPoolByHostTest.java
# modified:   src/test/java/me/prettyprint/cassandra/service/CassandraClientPoolTest.java
# modified:   src/test/java/me/prettyprint/cassandra/service/CassandraClientTest.java
# modified:   src/test/resources/cassandra-context-test.xml
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# unmerged:   src/main/java/me/prettyprint/cassandra/service/CassandraClientMonitorMBean.java
# unmerged:   src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java
#

Oops, there's a merge conflict. 
Now, edit the files CassandraClientMonitorMBean and KeyspaceTest and fix the merge. When done, commit with -c 3a8b6bc

~/dev/cassandra/hector $ vi src/main/java/me/prettyprint/cassandra/service/CassandraClientMonitorMBean.java
~/dev/cassandra/hector $ vi src/main/java/me/prettyprint/cassandra/service/KeyspaceTest.java
~/dev/cassandra/hector $ git add .
~/dev/cassandra/hector $ git ci -c 3a8b6bc
[0.6.0 2687509] refactorings for new CassandraHost and related configurator classes
 11 files changed, 108 insertions(+), 177 deletions(-)

~/dev/cassandra/hector $ git cherry-pick -xs 561a1cd883346b6b6726ad3a4a5199cf696f2b99
Finished one cherry-pick.
[0.6.0 b6cc83a] updated javadoc example for cassandraHostConfigurator (cherry picked from commit 561a1cd883346b6b6726ad3a4a5199cf696f2b99)
 1 files changed, 5 insertions(+), 5 deletions(-)

~/dev/cassandra/hector $ git log
commit b6cc83a2ed496dcaf1fa7bd46dc23c2289fb978b
Author: zznate
Date:   Wed Mar 31 17:03:28 2010 -0700

    updated javadoc example for cassandraHostConfigurator
    (cherry picked from commit 561a1cd883346b6b6726ad3a4a5199cf696f2b99)

    Signed-off-by: Ran Tavory 

commit 2687509b42aeb921c2487b80358f19c55e161168
Author: zznate 
Date:   Wed Mar 31 13:19:32 2010 -0700

    refactorings for new CassandraHost and related configurator classes

commit 8690dbe7a131f7adfa1272561ee9904f83112a9a
Author: zznate 
Date:   Wed Mar 31 13:18:27 2010 -0700

    extracted exhausted policy enum, encapsulate details of a host into a top-level object
    (cherry picked from commit 23306e9a55787f706f64eae642ed962764222b07)

    Signed-off-by: Ran Tavory


And now run tests make sure everything's good, and we're done.