Are singletons pathalogical liars?

Miško Hevery thinks so: Singletons are Pathological Liars

It actually discusses how Singletons allow APIs to be the liars (they depend on objects that aren’t explicitly advertised as dependencies).

I agree with Martin Probst’s response, but I’m actually more interested in this question by okoman.

The response is apparently answered in the post Dependency Injection Myth: Reference Passing

What do you think?

Delimited Continuations in Scala

First, the teaser:

Swarm is a framework allowing the creation of web applications which can scale transparently through a novel portable continuation-based approach. Like Map-Reduce, Swarm follows the maxim “move the computation, not the data”. However Swarm takes the concept much further, allowing it to be applied to almost any computation, not just those that can be broken down into map and reduce operations.

This is pretty cool. It is accomplished using Scala delimited continuations, which are confusing.

Delimited Continuations Explained (in Scala)

I found that link to be pretty helpful, but wanted to jot down my notes because otherwise I’ll have to start from the beginning when I try to understand it again. So this blog post is essentially just a journal entry. Read on if you dare!

Continue reading ‘Delimited Continuations in Scala’ »

OpenNLP Part-of-Speech (POS) Tags: Penn English Treebank

In the comments on my post about part-of-speech tagging, Manu asks

Can you post a legend what the pos tags stand for? At the moment I’m working on a project where I use this and I dont know at the moment how much tags there are and what e.g. “JJ”, “IN” and the rest of them means. This would be very helpful.

Ask and you shall receive!

These are the Penn English Treebank POS tags. Here’s the list that I found in an answer at StackOverflow,
but you’re on your own for finding out what each of these really means:

  1. CC Coordinating conjunction
  2. CD Cardinal number
  3. DT Determiner
  4. EX Existential there
  5. FW Foreign word
  6. IN Preposition or subordinating conjunction
  7. JJ Adjective
  8. JJR Adjective, comparative
  9. JJS Adjective, superlative
  10. LS List item marker
  11. MD Modal
  12. NN Noun, singular or mass
  13. NNS Noun, plural
  14. NNP Proper noun, singular
  15. NNPS Proper noun, plural
  16. PDT Predeterminer
  17. POS Possessive ending
  18. PRP Personal pronoun
  19. PRP$ Possessive pronoun
  20. RB Adverb
  21. RBR Adverb, comparative
  22. RBS Adverb, superlative
  23. RP Particle
  24. SYM Symbol
  25. TO to
  26. UH Interjection
  27. VB Verb, base form
  28. VBD Verb, past tense
  29. VBG Verb, gerund or present participle
  30. VBN Verb, past participle
  31. VBP Verb, non­3rd person singular present
  32. VBZ Verb, 3rd person singular present
  33. WDT Wh­determiner
  34. WP Wh­pronoun
  35. WP$ Possessive wh­pronoun
  36. WRB Wh­adverb

Enable WordPress Automatic Update

Are you seeing that nasty FTP page when trying to do an Automatic Update from within WordPress? I create new blogs frequently enough that this is a constant pain, but rare enough that I never remember how to do it.

Updating WordPress tells you how to identify the correct user:

If you do not know which user runs the httpd the output of ps auxw | grep -E 'http|apache|www'

And thanks to Greg in the comments over at Linode’s Manage Web Content with WordPress page for the succinct description:

To make wordpress updates and plugin installs function properly, you need to set the entire public directory structure to www-data ownership, like so:

chown -R www-data:www-data ..../yoursite.com/wordpress

How to use the OpenNLP 1.5.0 Parser

After a brief (*cough*cough*) delay, I’m back to figure out how in the world to use this Open NLP Parser. First, a quick refresher:

Getting Started

I’m only going to warn you once: this is a long post. Go grab a beer or a glass of wine or some coffee before starting. It’s long. Now I’ve warned you twice.

Continue reading ‘How to use the OpenNLP 1.5.0 Parser’ »

Hosting wordpress in a subdirectory of an existing Rails application

I have an existing rails application at mydomain.com and wanted to include a wordpress blog at mydomain.com/blog. I’m using Apache and Passenger.

My <VirutalHost> configuration has the DocumentRoot at /srv/www/mydomain.com/public (which is itself a symbolic link to /srv/www/mydomain.com/railsapp/public). For ease of maintenance, and to avoid any conflicts between rails and wordpress, I placed wordpress outside of the rails app at /srv/www/mydomain.com/wordpress.

However, I needed to tell Apache to redirect access to the wordpress resources which are not located in the DocumentRoot. The solution? Create an Apache Alias. There’s an excellent descrpition in the Linode Library about Managing Resources with Apache mod_alias.

This got me through the wordpress install with the wp-admin/install.php. Unfortunately, I still couldn’t access my blog. The default rails error page still kept rearing its ugly head. Turns out that’s due to Passenger, which kept directing my non-file-specific traffic to my rails application (e.g., when trying to access mydomain.com/blog or mydomain.com/blog/wp-admin without a specific php file in the URL).

I found the fix for this at WordPress Answers. My final working configuration is below, with the additional fix in bold:

<VirtualHost x.x.x.x:80>
    ...
    DocumentRoot /srv/www/mydomain.com/public
    Options FollowSymLinks

    # an Alias for the wordpress blog
    Alias /blog /srv/www/mydomain.com/wordpress
    <Directory /srv/www/mydomain.com/wordpress>
        PassengerEnabled off
        # make the WordPress .htaccess file work
        AllowOverride all
        Order allow,deny
        Allow from all
    </Directory>

    ...
</VirtualHost>

Making Time for Pet Projects

Lately I’ve been having a terrible time keeping any type of motivation or focus to work on my pet projects. It’s not a matter of being unable to come up with ideas (at least, not currently), but rather two problems:

  1. Finding the time
  2. Finishing

Ferdy Chistant tackles how to manage pet projects, which seems to really be about how to choose a good pet project in the first place. Essentially, pick something challenging and worth learning that doesn’t already have a solution.

While this isn’t exactly my problem area, it did lead me to another post that includes a section on Time management (scroll down…further). Unsurprisingly, it boils down to “make time,” but there is plenty of other good info in there, too.

The real answer for me came in yet another post by Ferdy titled Use mini tasks to keep your pet project moving. It might seem completely obvious, but reading it from someone else helps validate the idea. While time management is huge, you’ve got to be realistic in breaking off tasks that are both large and small.

Mini tasks that you can complete (one or more of) in an hour are essential. I find that I typically spend a lot of time digging through my TODO list getting reacquainted with the tasks/problems, then spend more time just getting “in the zone” before anything getting anywhere near productivity. I plan to separate my list into big tasks, mini tasks, and categorize as either research tasks, coding tasks, or bugs. Task management software isn’t necessary–an Excel or Google spreadsheet should do the trick.

One thing I would add to Ferdy’s notes is to have a solid description of the bug. Just like at work, the description of a bug or task needs to make as much sense two days later as it does two weeks or months later.

Having this list will (hopefully) also help me prioritize tasks to avoid losing time with “feature creep” that nobody else is there to keep in check. Now, back to the first problem: how to make the time for it?

Part-of-Speech (POS) Tagging with OpenNLP 1.5.0

Continuing from where I left off, I’m going to quickly touch on part-of-speech tagging before moving on. It’s actually pretty straightforward once you’re set up to run OpenNLP. This all assumes that you’ve already done sentence detection and tokenization. If you haven’t, go back to the beginning. Here are the links to the rest of my posts:

Getting Started

model files

Only one additional model file is needed for part-of-speech tagging.

Continue reading ‘Part-of-Speech (POS) Tagging with OpenNLP 1.5.0’ »

The Lemur Toolkit

I was reading Learning Similarity Metrics for Event Identification in Social Media (pdf) and caught a mention of the Lemur Toolkit, which I hadn’t previously heard about.

They used it for indexing the text representation of documents and, apparently, handling stemming, stop-words, and computing tf-idf vectors. I’ll have to look into this in the future when working with term vectors to see how easy it is to use.

The toolkit doesn’t appear to be active (final version from June 2010), but can be found at http://www.lemurproject.org/lemur.php.

Unable to locate the Javac Compiler with Maven and Eclipse

Unable to locate the Javac Compiler in:
C:\Program Files\Java\jre6\..\lib\tools.jar
Please ensure you are using JDK 1.4 or above and
not a JRE (the com.sun.tools.javac.Main class is required).
In most cases you can change the location of your Java
installation by setting the JAVA_HOME environment variable.

The solution that worked for me (tested on both 32- and 64-bit Eclipse/Java) was not to change the eclipse.ini, but to instead set the Runtime JRE on the JRE tab of the Run/Debug Configuration dialog to use the appropriate JDK, either as the “Workspace default JRE” or the “Alternate JRE”