December 31, 2011, 3:48 pm
Miško Hevery thinks so: Singletons are Pathological Liars
It actually discusses how Singletons allow APIs to be the liars (they depend on objects that aren’t explicitly advertised as dependencies).
I agree with Martin Probst’s response, but I’m actually more interested in this question by okoman.
The response is apparently answered in the post Dependency Injection Myth: Reference Passing
What do you think?
December 28, 2011, 1:00 pm
First, the teaser:
Swarm is a framework allowing the creation of web applications which can scale transparently through a novel portable continuation-based approach. Like Map-Reduce, Swarm follows the maxim “move the computation, not the data”. However Swarm takes the concept much further, allowing it to be applied to almost any computation, not just those that can be broken down into map and reduce operations.
This is pretty cool. It is accomplished using Scala delimited continuations, which are confusing.
Delimited Continuations Explained (in Scala)
I found that link to be pretty helpful, but wanted to jot down my notes because otherwise I’ll have to start from the beginning when I try to understand it again. So this blog post is essentially just a journal entry. Read on if you dare!
Continue reading ‘Delimited Continuations in Scala’ »
December 28, 2011, 11:36 am
In the comments on my post about part-of-speech tagging, Manu asks
Can you post a legend what the pos tags stand for? At the moment I’m working on a project where I use this and I dont know at the moment how much tags there are and what e.g. “JJ”, “IN” and the rest of them means. This would be very helpful.
Ask and you shall receive!
These are the Penn English Treebank POS tags. Here’s the list that I found in an answer at StackOverflow,
but you’re on your own for finding out what each of these really means:
- CC Coordinating conjunction
- CD Cardinal number
- DT Determiner
- EX Existential there
- FW Foreign word
- IN Preposition or subordinating conjunction
- JJ Adjective
- JJR Adjective, comparative
- JJS Adjective, superlative
- LS List item marker
- MD Modal
- NN Noun, singular or mass
- NNS Noun, plural
- NNP Proper noun, singular
- NNPS Proper noun, plural
- PDT Predeterminer
- POS Possessive ending
- PRP Personal pronoun
- PRP$ Possessive pronoun
- RB Adverb
- RBR Adverb, comparative
- RBS Adverb, superlative
- RP Particle
- SYM Symbol
- TO to
- UH Interjection
- VB Verb, base form
- VBD Verb, past tense
- VBG Verb, gerund or present participle
- VBN Verb, past participle
- VBP Verb, non3rd person singular present
- VBZ Verb, 3rd person singular present
- WDT Whdeterminer
- WP Whpronoun
- WP$ Possessive whpronoun
- WRB Whadverb
December 19, 2011, 12:01 pm
Are you seeing that nasty FTP page when trying to do an Automatic Update from within WordPress? I create new blogs frequently enough that this is a constant pain, but rare enough that I never remember how to do it.
Updating WordPress tells you how to identify the correct user:
If you do not know which user runs the httpd the output of ps auxw | grep -E 'http|apache|www'
And thanks to Greg in the comments over at Linode’s Manage Web Content with WordPress page for the succinct description:
To make wordpress updates and plugin installs function properly, you need to set the entire public directory structure to www-data ownership, like so:
chown -R www-data:www-data ..../yoursite.com/wordpress
December 4, 2011, 6:00 pm
After a brief (*cough*cough*) delay, I’m back to figure out how in the world to use this Open NLP Parser. First, a quick refresher:
Getting Started
I’m only going to warn you once: this is a long post. Go grab a beer or a glass of wine or some coffee before starting. It’s long. Now I’ve warned you twice.
Continue reading ‘How to use the OpenNLP 1.5.0 Parser’ »
November 10, 2011, 5:21 pm
I have an existing rails application at mydomain.com and wanted to include a wordpress blog at mydomain.com/blog. I’m using Apache and Passenger.
My <VirutalHost> configuration has the DocumentRoot at /srv/www/mydomain.com/public (which is itself a symbolic link to /srv/www/mydomain.com/railsapp/public). For ease of maintenance, and to avoid any conflicts between rails and wordpress, I placed wordpress outside of the rails app at /srv/www/mydomain.com/wordpress.
However, I needed to tell Apache to redirect access to the wordpress resources which are not located in the DocumentRoot. The solution? Create an Apache Alias. There’s an excellent descrpition in the Linode Library about Managing Resources with Apache mod_alias.
This got me through the wordpress install with the wp-admin/install.php. Unfortunately, I still couldn’t access my blog. The default rails error page still kept rearing its ugly head. Turns out that’s due to Passenger, which kept directing my non-file-specific traffic to my rails application (e.g., when trying to access mydomain.com/blog or mydomain.com/blog/wp-admin without a specific php file in the URL).
I found the fix for this at WordPress Answers. My final working configuration is below, with the additional fix in bold:
<VirtualHost x.x.x.x:80>
...
DocumentRoot /srv/www/mydomain.com/public
Options FollowSymLinks
# an Alias for the wordpress blog
Alias /blog /srv/www/mydomain.com/wordpress
<Directory /srv/www/mydomain.com/wordpress>
PassengerEnabled off
# make the WordPress .htaccess file work
AllowOverride all
Order allow,deny
Allow from all
</Directory>
...
</VirtualHost>
September 22, 2011, 2:18 pm
Lately I’ve been having a terrible time keeping any type of motivation or focus to work on my pet projects. It’s not a matter of being unable to come up with ideas (at least, not currently), but rather two problems:
- Finding the time
- Finishing
Ferdy Chistant tackles how to manage pet projects, which seems to really be about how to choose a good pet project in the first place. Essentially, pick something challenging and worth learning that doesn’t already have a solution.
While this isn’t exactly my problem area, it did lead me to another post that includes a section on Time management (scroll down…further). Unsurprisingly, it boils down to “make time,” but there is plenty of other good info in there, too.
The real answer for me came in yet another post by Ferdy titled Use mini tasks to keep your pet project moving. It might seem completely obvious, but reading it from someone else helps validate the idea. While time management is huge, you’ve got to be realistic in breaking off tasks that are both large and small.
Mini tasks that you can complete (one or more of) in an hour are essential. I find that I typically spend a lot of time digging through my TODO list getting reacquainted with the tasks/problems, then spend more time just getting “in the zone” before anything getting anywhere near productivity. I plan to separate my list into big tasks, mini tasks, and categorize as either research tasks, coding tasks, or bugs. Task management software isn’t necessary–an Excel or Google spreadsheet should do the trick.
One thing I would add to Ferdy’s notes is to have a solid description of the bug. Just like at work, the description of a bug or task needs to make as much sense two days later as it does two weeks or months later.
Having this list will (hopefully) also help me prioritize tasks to avoid losing time with “feature creep” that nobody else is there to keep in check. Now, back to the first problem: how to make the time for it?
June 30, 2011, 5:00 pm
Continuing from where I left off, I’m going to quickly touch on part-of-speech tagging before moving on. It’s actually pretty straightforward once you’re set up to run OpenNLP. This all assumes that you’ve already done sentence detection and tokenization. If you haven’t, go back to the beginning. Here are the links to the rest of my posts:
Getting Started
model files
Only one additional model file is needed for part-of-speech tagging.
Continue reading ‘Part-of-Speech (POS) Tagging with OpenNLP 1.5.0’ »
June 13, 2011, 3:15 pm
I was reading Learning Similarity Metrics for Event Identification in Social Media (pdf) and caught a mention of the Lemur Toolkit, which I hadn’t previously heard about.
They used it for indexing the text representation of documents and, apparently, handling stemming, stop-words, and computing tf-idf vectors. I’ll have to look into this in the future when working with term vectors to see how easy it is to use.
The toolkit doesn’t appear to be active (final version from June 2010), but can be found at http://www.lemurproject.org/lemur.php.
June 2, 2011, 10:49 am
Unable to locate the Javac Compiler in:
C:\Program Files\Java\jre6\..\lib\tools.jar
Please ensure you are using JDK 1.4 or above and
not a JRE (the com.sun.tools.javac.Main class is required).
In most cases you can change the location of your Java
installation by setting the JAVA_HOME environment variable.
The solution that worked for me (tested on both 32- and 64-bit Eclipse/Java) was not to change the eclipse.ini, but to instead set the Runtime JRE on the JRE tab of the Run/Debug Configuration dialog to use the appropriate JDK, either as the “Workspace default JRE” or the “Alternate JRE”