Blogs

Why PostgreSQL?

Quick overview of what PostgreSQL brings to the table that is not available in MySQL:

  • Uses MVCC for all tables providing:
    • Fully transactional including ACID compliance for consistency
    • Nested transactions
  • SQL 2008 compliant
  • Foreign keys for any table
  • Advanced table partitioning
  • Highly sophisticated query planner/optimizer
    • Can split up a query for execution across multiple CPUs simultaneously
    • Collects internal statistics for adaptive query planning
    • Special genetic query optimizer for queries with large numbers of joins
    • Supports multiple indexes per table per query
  • Advanced support for query & results caching
  • Hot/online backup
  • Point-in-time-recovery
  • Write-ahead logs for fault-tolerance
  • Tablespaces for controlling physical disk layout
  • Native asynchronous replication guaranteeing identical results on all machines. Supports both:
    • Streaming replication
    • Hot standby
  • Partial indexes
  • Index creation/removal does not lock table
  • Full support for constraints
  • Transactional DDL - changes like table modifications can placed inside a transaction and rolled back

Specific disadvantages to MySQL:

  • Confusion with table types - MyISAM vs InnoDB
  • Designed to scale out not up - does not utilize larger numbers of cores effeciently and cannot spread queries across cores
  • Hot backup of is difficult for databases containing both InnoDB and MyISAM
  • Replication is mediocre and error prone
  • InnoDB stores the data with the primary key, so any queries using secondary indices are slower
  • Subqueries not well optimized
  • Only uses a single index per table per query
  • Index creation/removal requires an exclusive write lock
  • MyISAM only offers table level locking which causes severe performance degradation under heavy concurrency
  • Limited support for constraints
  • No transactional DDL - changes like table modifications are automatically committed and cannot be rolled back

MySQL offers the following advantages over PostgreSQL:

  • MyISAM tables can offer better read performance, specifically for simple SELECT queries, but at the cost of no support for transactions, foreign keys or data guarantees
  • COUNT(*) on MyISAM is very fast and slow on PostgreSQL
  • INSERT IGNORE and INSERT...ON DUPLICATE UPDATE

 

Different content in Rails based on UserAgent

I was recently working on a website built using Rails that needed to render different content for certain user agents. Specifically, we needed simpler versions of certain pages for BlackBerry devices. Here's how I accomplished.

First, I added a new mime-type for BlackBerry by adding the following line to config/initializers/mime_types.rb:

Mime::Type.register_alias "text/html", :blackberry

Next, I added two utility methods to app/controllers/application.rb:

# Checks UserAgent
def is_blackberry?
  ua = request.user_agent
  return false if ua.nil?
  return false if ! ua.downcase.index('blackberry')
 
  # Don't call the BlackBerry 9800 a BlackBerry, since it has a modern browser
  # based on WebKit:
  # Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en) AppleWebKit/534.1+ (KHTML, Like Gecko) Version/6.0.0.141 Mobile Safari/534.1+
  return false if ua.downcase.index('webkit')
 
  # Must be a BlackBerry!
  true
end
 
# Sets the respond_to format to blackberry if blackberry
def set_blackberry_format
  if !request.xhr? && is_blackberry?
    request.format = :blackberry
  end
end

With that in hand, it's easy to render BlackBerry specific content on specific pages:

set_blackberry_format
respond_to do |format|
  format.blackberry
  format.html
  format.js { render :layout => false }
end

Tomcat init script for Ubuntu

Recently I spent some time working on improving my init scripts for Tomcat 6.x in a production environment running Ubuntu. One of the major problems we had encountered was that occasionally Tomcat refuses to shut down completely and requires a kill -9 to stop it. The standard init scripts I had seen didn't solve this problem at all.

Laliluna has a great article that focuses on RedHat, CentOS and Fedora. Unfortunately, their scripts didn't work correctly under Ubuntu 8.04 LTS. As a result, I spent some time modifying their scripts to get them to work correctly under Unbuntu. Many thanks to Laliluna for doing the heavy work.

Without any further ado, here's what I put together:

#!/bin/bash
#
# Startup script for Jakarta Tomcat
# Script should work on Ubuntu Linux.
# WARNING: The script does not allow to run Tomcat on privileged ports as non root user. 
# For this use case try : http://tomcat.apache.org/tomcat-6.0-doc/setup.html and http://commons.apache.org/daemon/jsvc.html
# 
# Should start normally after the databases and before http server
# chkconfig: 345 80 10
# description: Jakarta Tomcat Java Servlet/JSP Container
# processname: tomcat
# pidfile: /var/run/tomcat/tomcat.pid
 
##### In this area you can find settings which are likely to change frequently ####
 
JAVA=/opt/java/current/bin/java
# unprivileged user running Tomcat server
tomcatuser=tomcat
 
# servicename used as pidfile and lockfile name, must correspond to 'processname:' at the top of this file
# If not linux will not detect the running service during runlevel switch and will not shut it down normally
servicename=tomcat
 
# folder where Tomcat is installed
CATALINA_HOME=/opt/tomcat
 
# Options for the JVM
JAVA_OPTS="$JAVA_OPTS -Xms1024m -Xmx2048m -XX:MaxPermSize=512m -XX:PermSize=128m"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 JAVA_OPTS="-Djavax.servlet.request.encoding=UTF-8 -Djavax.servlet.response.encoding=UTF-8 -Dfile.encoding=UTF-8 $JAVA_OPTS"
 
##### End of frequent settings area #####
 
pidfile=/var/run/tomcat/$servicename
lockfile=/var/lock/$servicename
 
#runsecure=1 #starts tomcat with java security
runsecure=0
 
# Optional additional libs you would like to add to the classpath (= JVM Option -classpath)
CLASSPATH=""
# Optional Java Security Socket extension
# CLASSPATH="$CLASSPATH":"$JSSE_HOME"/lib/jcert.jar:"$JSSE_HOME"/lib/jnet.jar:"$JSSE_HOME"/lib/jsse.jar
 
# path to Tomcat lib
CLASSPATH="$CLASSPATH":"$CATALINA_HOME"/bin/bootstrap.jar
 
# Directory holding configuration, defaults to CATALINA_HOME
# In a Tomcat cluster you might reuse the servicename to identify the base directory
 
CATALINA_BASE="$CATALINA_HOME"
# server log during startup / shutdown
logfile=$CATALINA_BASE/logs/catalina.out
# endorsed allows to overwrite JVM libs -> JVM option -Djava.endorsed.dirs 
#JAVA_ENDORSED_DIRS="$CATALINABASEDIR"/endorsed
 
# Define the java.io.tmpdir to use for Catalina
CATALINA_TMPDIR="$CATALINA_BASE"/temp
 
# Set juli LogManager if it is present
if [ -r "$CATALINA_BASE"/conf/logging.properties ]; then
  JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager"
  LOGGING_CONFIG="-Djava.util.logging.config.file=$CATALINA_BASE/conf/logging.properties"
fi
 
#### End of settings #####
 
# build java command to start Tomcat
command="$JAVA $JAVA_OPTS $LOGGING_CONFIG $CATALINA_OPTS $LOGGING_CONFIG \
      -Djava.endorsed.dirs=$JAVA_ENDORSED_DIRS -classpath $CLASSPATH \
      -Dcatalina.base=$CATALINA_BASE \
      -Dcatalina.home=$CATALINA_HOME \
      -Djava.io.tmpdir=$CATALINA_TMPDIR" 
 
if [ "$runsecure" = "1" ]; then
  command="$command -Djava.security.manager -Djava.security.policy=$CATALINA_BASE/conf/catalina.policy"
fi
 
command="$command org.apache.catalina.startup.Bootstrap"
 
 
start()
{
	echo $"Starting $servicename based at $CATALINA_BASE "
 
	daemon --user=$tomcatuser --pidfile=$pidfile --output=$logfile -- $command start
	RETVAL=$?
 
	[ "$RETVAL" = 0 ] && touch $lockfile
	echo
}
 
stop()
{
	echo -n $"Stopping $prog: "
	if [ ! -r $pidfile ]; then
		echo "Pidfile $pidfile cannot be read"
		RETVAL=1
		return
	fi
	# Sends TERM signal first and kills finally after 10 seconds
	start-stop-daemon --pidfile $pidfile -R 10 --stop
	RETVAL=$?
	[ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
	echo
}
 
version()
{
	$JAVA -classpath $CATALINA_HOME/lib/catalina.jar org.apache.catalina.util.ServerInfo
	RETVAL=$?
}
 
case "$1" in
	start)
		start
		;;
	stop)
		stop
		;;
	restart)
		stop
		start
		;;
	version)
		version
		;;
	status)
		status -p $pidfile $servicename
		RETVAL=$?
		;;
	*)
		echo $"Usage: $0 {start|stop|restart|version|status}"
		RETVAL=1
esac
exit $RETVAL

Ruby mixin for the Enum pattern

Sometimes you just want to use an Enum. Unfortunately, if you're a Ruby developer, Ruby does not offer a native enum structure. Here's a simple approach using a mixin module:

module Enum
  def const_missing(key)
    @enum_hash[key]
  end
 
  def add_enum(key, value)
    @enum_hash ||= {}
    @enum_hash[key] = NameValuePair.new(value, key.to_s.downcase)
  end
 
  def each
    @enum_hash.each {|key, value| yield(key, value) }
  end
 
  def enums
    @enum_hash.keys
  end
 
  def enum_values
    @enum_hash.values
  end
 
  def get_enum_hash
    @enum_hash
  end
 
  def find_by_key(key)
    @enum_hash[key.upcase.to_sym]
  end
end

The Enum mixin depends on a NameValuePair class to hold the data:

class NameValuePair
  attr_reader :label, :value
 
  def initialize(label, value)
    @label = label
    @value = value
  end
 
  def first
    @label
  end
 
  def last
    @value
  end
end

I included first and last methods to better support the select and options_for_select helper methods in Rails. Here's how you might use it:

class FooEnum
  extend Enum
 
   self.add_enum(:APPLE, "Apple")
   self.add_enum(:PEAR, "Pear")
   self.add_enum(:ALL, "All Fruit")
end
 
FooEnum::APPLE ==> #<NameValuePair @value="apple", @label="Apple">
FooEnum::ALL.value ==> "all"
FooEnum::ALL.label ==> "All Fruit"
FooEnum.find_by_key('apple') ==> #<NameValuePair @value="apple", @label="Apple">

To Rewrite or Not to Rewrite: The Ugly Question

I recently had a discussion about the idea of rewriting software from scratch. I actually played the devil's advocate and argued against ever throwing out and rewriting, which really got me thinking about the whole concept.

The discussion centered around article by Joel Spolsky (of Joel on Software) titled Things You Should Never Do, Part 1. His points against rewrites include:

  1. The ugly code you throw out has been hardened and tested. It's filled with bug fixes. You're throwing out that knowledge and expertise.
  2. You're throwing out market leadership and "giving a gift of two or three years to your competitors".
  3. You're not going to do a better job writing the code a second time than you did the first time, especially since it's unlikely you have the same team that wrote the earlier version.
  4. You will introduce new bugs.

Joel further argues that there are three major reasons developers want to rewrite code and none of them require rewrites:

  1. Architectural problems. The "you got your gui in my business logic" problem. This can be handled by small but steady code refactorings.
  2. Inefficiency. Again, can be handled by small code refactorings.
  3. The code is fugly. This may be due to complexity and bug fixes, in which case see point #1 above. Or it may be due to poor and changing naming conventions, in which case it can be fixed by a simple Find-Replace.

These are all excellent points. On some level, I agree with this entirely. Even many nasty combinations of all three problems can be solved by steady refactorings. I have worked for places where people pushed for rewrites that weren't necessary. But these were larger businesses with a well established core product. These were not early startups. That's why I believe Joel makes several assumptions which are fatal to his arguments.

First, he assumes the software project is really large and complex. While some of us may have worked on projects of that size and scope, quite of few of us work on much smaller projects. Simply put, it's a matter of scale.

Second, as a corollary of his first assumption, Joel also assumes that a rewrite requires years not months. Again, this is likely true for a product like Excel or Word... but this simply isn't true for many of the sites and products I've worked on. Furthermore, the use of agile or rapid development technologies such as Ruby on Rails can dramatically shrink this window.

Third, and perhaps most importantly, Joel assumes that the time required to cope with a messy code-base and make steady refactorings is significantly less than the time required to rewrite the app. And he assumes that's a worthwhile trade off. This may be clear cut for larger products or companies, but I question whether or not that's accurate for a startup. The more tangled your code, the longer it takes you to make changes. The longer it takes to make changes, the less nimble you are and the longer it takes you to respond to changes in company direction or marketplace demands.

It's that last point that I believe is most important to those of us working for small startups. We tend to be small young companies who are still striving to find our exact place in the wider world. We're often in cutting edge spaces where there is no clear cut path to success. And usually we're steadily seeing greater numbers of competitors in our space. It seems to me that agility is vitally important to people us. We need to be able to makes changes rapidly as our knowledge of the space evolves. Fundamentally, I think it's better to have a decent product/feature/whatever out in the hands of consumers than it is to have a nearly perfect product that's still under development. Don't get me wrong, I'm sure I'm preaching to the choir. :) But, I think it's critical to keep the need for agility and nimbleness in the forefront of our thoughts.

Fourth, Joel assumes any architectural problems can be solved by steady refactoring. Frankly, I disagree. I think there exist serious architectural flaws, especially related to scalability that cannot be easily solved by refactoring. eBay, LinkedIn, Facebook and Yahoo have all had major rewrites in their history that were directly attributed to serious architectural failings.

That is not to say that a full rewrite is necessarily a desirable goal. :) However, it takes careful management and planning to avoid finding yourself in this position. eBay used to employ a strategy they called headroom, which basically set aside 20+% of all development time to refactor code and it keep it in top working order. While I think it may very difficult to employ such a strategy in a startup, it may be worth considering.

Syndicate content