Enforcing design principles in software

Before starting to write a new piece of software, I come up with some simple design principles that I think will do some good.

For smallish projects I just keep the principles in mind and behave so, but for bigger stuff I try to actively enforce the policy.

“In theory, theory and practice are the same. In practice, they are not.”
— Lawrence Peter Berra

In theory if you state your principle and you stick to it, there will be no need for enforcement; I have a friend that has never written a single bug in his life. It’s true I swear. For me it’s not quite so, and I actually do some damage sometimes, so I benefit from having some safety checks. Actively checking rules makes it difficult to take shortcuts like “I’ll fix this later” or “this will never be null right?”. Also, if a violation happens and checks are in place, it will be easier to debug the problem and fix the cause. With well established check policies, each method will end up checking the input from each other method – policy checks well enforced can eradicate some classes of bugs, much like a vaccine.

The principle I use the most is “nulls are no good”. It helps to cut the number of null checks and to rule out nullpointer exceptions, that are always a pain because a they tend to happen far from where the assignment has happened.

To enforce a principle, you have to make up a rule and implement it in your code. At first the best rule to implement this principle may seem “always check for nullity“, but I’ll try to show that in this case, just half of the rule, “check for nullity before assigning” is just enough to cover your ass and is less verbose.

Let’s begin with an example – everybody likes the bang of a popping balloon, so I have implemented a balloons collection that allows to pop many balloons at once for a bigger, better bang.

class Balloons {
	List<Balloon> balloons = new ArrayList<Balloon>();
	void addBalloon(Balloon balloon) {
		this.balloons.add(balloon);
	}

	void pinchAll() {
		for (Balloon b : balloons)
                   b.pinch();
	}

	public static void main(String[] args) {
		Balloons balloons = new Balloons();
		balloons.addBalloon(new Balloon());
		balloons.addBalloon(null); // one of your little helpers has run out of gas
		balloons.pinchAll();
	}
}

Clearly, running this class will cause a NPE and the program to terminate. Unfortunately the stack trace will not clarify who added that null:

Exception in thread "main" java.lang.NullPointerException
	at Balloons.pinchAll(Balloons.java:16)
	at Balloons.main(Balloons.java:23)

To get rid of the NPE, you can just add a nullcheck inside pinchAll.

void pinchAll() {
    for (Balloon b : balloons)
        if (b != null)
            b.pinch();
}

This version will avoid the NPE, sure, but if you ignore null values, you’re better off not putting them in your collection in the first place, right?

So let’s try this:

void addBalloon(Balloon balloon) {
	if (balloon == null) trow new IllegalArgumentException(“balloon must not be null”);
	this.balloons.add(balloon);
}

Way better: this time the output will actually help to pin down the cause of the problem

Exception in thread "main" java.lang.IllegalArgumentException: balloon must not be null
	at Balloons.addBalloon(Balloons.java:10)
	at Balloons.main(Balloons.java:22)

Additionally, since you have protected your code so that balloons will never contain a null, you can remove the check in pinchAll.

At the bottom of the post I’ve attached snippets of my “Protect” class, that helps reducing the bloat of enforcing the policy.

Some examples:

void addBalloon(Balloon balloon) {
	Protect.valid(balloon);
	this.balloons.add(balloon);
}

void addMany(Balloon one, Balloon two) {
	Protect.valid(one, two);
        //....
}

void addWithLabel(String label, Balloon balloon) {
    // this will also check for empty strings
    Protect.valid(label, balloon);
    //....
}

I have picked up the habit of testing all arguments with Protect.notNull, and then think if I really need it or if this code is implicitly protected by it call hierarchy. Yes, it’s bulky and your lean and clean code will look bloated, but mind that you’ll save a lot of !=null checks after invoking methods. It is also useful as a form of “active documentation” as any reader will quickly get that nulls are not accepted.

Null protection is probably at its best when used at the constructor of an immutable class: you know that all instances of that class do not contain nulls

// rock solid bunga bunga!!
public class BungaBunga {
	private final String secretLocation;
	public BungaBunga(String secretLocation) {
		Protect.valid(secretLocation);
		this.secretLocation= secretLocation;
	}

	// ....
}

Beware, testing for nullity will take some time – on my laptop about a ns for the simple check (object == null) and maybe 10 for the version with varargs. It’s not much but they may add up, so in you may want to skip the checks in some regions of your codes, especially method invocations in loops. As most things in life, don’t overdo it.

It’s possible to add more complex checks or other kinds. In a recent app that was using some simple spatial geometry I enforced all the ints to be positive, because I knew there was no space for negs in the app.

If you want to get rid of the checks before release to production, it’s quite easy to comment out all the invocations with a script, but this may be brittle and make debugging the code a bit more complex. If you are into this kind if things, it is possible to enforce rules transparently using AspectJ and some annotations. This has the benefit of making the code cleaner, but you loose the “documentation” side of the checks.

[Note: in C# it’s easy to compile methods only in debug builds, using [Conditional("DEBUG")] ]

My Protect class. As you can see it is a very crude piece of code, but it does the job.


/**
 * Set of methods to enforce not nullity of objects and not emptiness of strings
 * @author Matteo Caprari
 */
public class Protect {

	static void valid(Object object) {
		if (object == null)
			throw new IllegalArgumentException("Object in position 0 is null");
	}		

	static void valid(String string) {
		if (string == null || string.trim().length() == 0)
			throw new IllegalArgumentException("String in position 0 is null or empty");
	}

	static void valid(String a, String b) {
		if (a == null || a.trim().length() == 0)
			throw new IllegalArgumentException("String in position 0 is null or empty");
		if (b == null || b.trim().length() == 0)
			throw new IllegalArgumentException("String in position 1 is null or empty");
	}

	static void valid(String a, String b, String c) {
		if (a == null || a.trim().length() == 0)
			throw new IllegalArgumentException("String in position 0 is null or empty");
		if (b == null || b.trim().length() == 0)
			throw new IllegalArgumentException("String in position 1 is null or empty");
		if (c == null || c.trim().length() == 0)
			throw new IllegalArgumentException("String in position 2 is null or empty");
	}

	static void valid(String... strings) {
		for (int i=0; i<strings.length; i++) {
			if (strings[i] == null || strings[i].trim().length() == 0)
				throw new IllegalArgumentException("String in position " + i + " is null or empty");
		}
	}

	static void valid(Object a, Object b) {
		if (a == null)
			throw new IllegalArgumentException("Object in position 0 is null or empty");
		if (b == null)
			throw new IllegalArgumentException("Object in position 1 is null or empty");

		if (a instanceof String && ((String) a).trim().length() == 0)
			throw new IllegalArgumentException("String in position 0 is null or empty");

		if (b instanceof String && ((String) b).trim().length() == 0)
			throw new IllegalArgumentException("String in position 1 is null or empty");
	}

	static void valid(Object a, Object b, Object c) {
		if (a == null)
			throw new IllegalArgumentException("Object in position 0 is null or empty");
		if (b == null)
			throw new IllegalArgumentException("Object in position 1 is null or empty");
		if (c == null)
			throw new IllegalArgumentException("Object in position 2 is null or empty");

		if (a instanceof String && ((String) a).trim().length() == 0)
			throw new IllegalArgumentException("String in position 0 is null or empty");

		if (b instanceof String && ((String) b).trim().length() == 0)
			throw new IllegalArgumentException("String in position 1 is null or empty");

		if (c instanceof String && ((String) c).trim().length() == 0)
			throw new IllegalArgumentException("String in position 3 is null or empty");
	}

	// beware, this is maybe 10 times slower than the non-varargs version
	static void valid(Object... objects) {
		for (int i=0; i<objects.length; i++) {
			if (objects[i] == null)
				throw new IllegalArgumentException("Object in position " + i + " is null");
			if (objects[i] instanceof String && ((String) objects[i]).trim().length() == 0)
				throw new IllegalArgumentException("String in position " + i + " is null or empty");

		}
	}				

}

Thanks for reading this far

Posted in java, Uncategorized | Leave a comment

Java bytecode, string concatenation and StringBuilder

In my earlier post I was making a fuss over picking the faster hash algorithm, and then I realised I was using + to concatenate strings.
Should I always use a StringBuilder? Should I care even for small strings? Heck, if I use the StringBuilder I’ll surely create one extra object anyway…

I tried some variations of the test and I did not find any performance difference when comparing simple concatenation to using the string builder. I even tried bigger strings and other combinations. Still No difference.

That got me curious, so I wrote a very simple class and looked at it in the bytecode outline:

This java code:

public static void main(String[] args) {
	String cip = "cip";
	String ciop = "ciop";
	String plus = cip + ciop;
	String build = new StringBuilder(cip).append(ciop).toString();
}

Generates this bytecode (see how the two concatenation styles generate the very same code):

  L0
    LINENUMBER 23 L0
    LDC "cip"
    ASTORE 1
   L1
    LINENUMBER 24 L1
    LDC "ciop"
    ASTORE 2
// cip + ciop
   L2
    LINENUMBER 25 L2

    NEW java/lang/StringBuilder
    DUP
    ALOAD 1
    INVOKESTATIC java/lang/String.valueOf(Ljava/lang/Object;)Ljava/lang/String;
    INVOKESPECIAL java/lang/StringBuilder.(Ljava/lang/String;)V
    ALOAD 2
    INVOKEVIRTUAL java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    INVOKEVIRTUAL java/lang/StringBuilder.toString()Ljava/lang/String;

    ASTORE 3
// new StringBuilder(cip).append(ciop).toString()
   L3
    LINENUMBER 26 L3

    NEW java/lang/StringBuilder
    DUP
    ALOAD 1
    INVOKESPECIAL java/lang/StringBuilder.(Ljava/lang/String;)V
    ALOAD 2
    INVOKEVIRTUAL java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    INVOKEVIRTUAL java/lang/StringBuilder.toString()Ljava/lang/String;

    ASTORE 4
   L4
    LINENUMBER 27 L4
    RETURN

The compiler has transformed “cip+ciop” into “new StringBuilder(cip).append(ciop).toString()“.
In other words using “+” is a shorthand for the more verbose StringBuilder idiom.

The compiler will do same trick for cip + "ciop" and "cip" + ciop. (In case you wonder, "cip" + "ciop" will just be compiled as "cipciop").

This is great, but beware, the compiler will not always do the best thing for you:

This code

String big = "both";
big += cip;
big += ciop;

Will be compiled into this:

String big = "both";
big = new StringBuilder(bag).append(cip).toString();
big = new StringBuilder(bag).append(ciop).toString();

While of course the most efficient way is

String big = new StringBuilder("both").append(cip).append(ciop).toString()

Now of course nobody in his right mind would ever write any of the above (or use those variable names), but here is a pattern that you may have seen before:

String boo = "both";
for (int i=1; i<100; i++)
     boo += cip + ciop;

Now the compiler will do the obvious thing and instantiate one new StringBuilder at each iteration:

String boo = "both";
for (int i=1; i<100; i++)
     boo += new StringBuilder(boo).append(cip).append(ciop).toString();

In this case it is best to use this idiom:

StringBuilder foo = new StringBuilder("both");
for (int i=1; i<2; i++)
    foo.append(cip).append(ciop);
String boo = foo.toString();

Enjoy :)

Posted in java | Tagged , , , | 9 Comments

Evaluating relative speed of java digest (hashing) algorithms

It’s best practice to encrypt security tokens such as passwords and sessions ids in your database. I was just doing that to the session tokens for a project at work, and I wondered which algorithm to pick if speed was the only consideration.

I cranked up some code that measures the wall-clock that it takes for each algorithm to hash a bunch of strings (10 millions). I found that MD5 is the fastest, and MD2 is the slowest, taking roughly twice the time. I would have gone for MD5 anyway as it is the standard choice, but it’s good to see that it’s quick too.

The usual disclaimers apply: this test is very un-scientific and is only relevant for my specific use case (the tokens I handle are the same length of the random strings in the test) so don’t plan your business on it. That’s pretty quick stuff anyway (15 to 30 microsecond each encryption), so you may want to pick an algorithms for its security features rather then for its execution speed. See the docs for more info.

Output:

Creating 10000000 random strings... Created.
Testing algo MD2...	Completed in 339126 milliseconds
Testing algo MD5...	Completed in 169690 milliseconds
Testing algo SHA-1...	Completed in 200398 milliseconds
Testing algo SHA-256...	Completed in 211560 milliseconds
Testing algo SHA-384...	Completed in 303999 milliseconds
Testing algo SHA-512...	Completed in 316265 milliseconds
Test Complete.

And code:

import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.SecureRandom;

public class CompareHashFunctions {

	private static SecureRandom random = new SecureRandom();

	public static void main(String[] args) throws Exception {

		int runs = 10000000;				

		System.out.print("Creating " + runs + " random strings... ");
		String salt = randomString();
		String[] strings = new String[runs];
		for (int i=0; i<strings.length; i++) {
			strings[i] = randomString();
		}

		System.out.println("Created. ");	

		runTest(salt, strings, "MD2");
		runTest(salt, strings, "MD5");
		runTest(salt, strings, "SHA-1");
		runTest(salt, strings, "SHA-256");
		runTest(salt, strings, "SHA-384");
		runTest(salt, strings, "SHA-512");		

		System.out.println("Test Complete.");
	}

	static void runTest(String salt, String[] strings, String algo) throws Exception {
		System.out.print("Testing algo " + algo + "...\t");
		MessageDigest instance = MessageDigest.getInstance(algo);
		long start = System.nanoTime();
		for (int i=0; i<strings.length; i++) {
			byte[] bytes = (salt + strings[i]).getBytes("UTF-8");
			MessageDigest clone = (MessageDigest)instance.clone();
			new String(clone.digest(bytes));
		}
		long elapsed = (System.nanoTime() - start) / 1000000;
		System.out.println("Completed in " + elapsed +  " milliseconds ");
	}		

	/**
	 * Random string
	 * @return a random string of 25 or 26 chars
	 */
	static String randomString() {
		// 130 bit random integer converted to string in base 32
		return new BigInteger(130, random).toString(32);
	}
}
Posted in Uncategorized | 4 Comments

Where are generics stored in java compiled classes?

I was having a go at learning some Java bytecode and started looking at how generics were handled. As expected, the compiler was emitting cast instructions when generic types where used, but nothing where the types where declared: generics in Java are implemented using a technique called “erasure”. Straight from the docs:

When a generic type is instantiated, the compiler translates those types by a technique called type erasure — a process where the compiler removes all information related to type parameters and type arguments within a class or method. Type erasure enables Java applications that use generics to maintain binary compatibility with Java libraries and applications that were created before generics.

http://download.oracle.com/javase/tutorial/java/generics/erasure.html

But sure not ALL information about the type parameters is lost. That would mean that once I compile my code, all other developer would use it “the old way”, with casts and all, but clearly this is not the case.

Let’s write two simple classes:

public class GenericClass<T> {}
public class StandardClass {}

And decompile them:

$ javap -c GenericClass
public class learn.GenericClass extends java.lang.Object{
public learn.GenericClass();
  Code:
   0:   aload_0
   1:   invokespecial   #8; //Method java/lang/Object."<init>":()V
   4:   return
}
$ javap  -c StandardClass
Compiled from "StandardClass.java"
public class learn.StandardClass extends java.lang.Object{
public learn.StandardClass();
  Code:
   0:   aload_0
   1:   invokespecial   #8; //Method java/lang/Object."<init>":()V
   4:   return
}

As it should, the bytecode looks exactly the same. No type parameters.

Let’s get more data:

javap -verbose StandardClass
Compiled from "StandardClass.java"
public class learn.StandardClass extends java.lang.Object
  SourceFile: "StandardClass.java"
  minor version: 0
  major version: 50
  Constant pool:
const #1 = class        #2;     //  learn/StandardClass
// ... snip
const #15 = Asciz       StandardClass.java;
// ... snip
$ javap -verbose GenericClass
Compiled from "GenericClass.java"
public class learn.GenericClass extends java.lang.Object
  SourceFile: "GenericClass.java"
  Signature: length = 0x2
   00 13
  minor version: 0
  major version: 50
  Constant pool:
const #1 = class        #2;     //  learn/GenericClass
// ... snip
const #19 = Asciz       <T:Ljava/lang/Object;>Ljava/lang/Object;;

// … snip

The two outputs are different at last: the constant_pool section of a class file has an optional "signature" field. This optional can specify the full signature of the class, including type parameters. The compiler can then use this information to do the right thing when turns sources into binaries.

This stuff is specified in the section "4.8.8 The Signature Attribute" of the updated JVM Specification.

Posted in java | 2 Comments

simple webserver and rest with jetty and NO XML

Starting a new web project (maybe with rest?) in Java requires mastery of many complex abstract components. You need a webapp for tomcat (fiddle with some xml files), spring (import plenty of jars and add some more xml files), pick a rest framework like spring, resteasy, restlset (fiddle with annotations), deploy to tomcat and go.

I won’t deny all this components add structure, reusability and a lot of good stuff, but sometimes I have simple needs that scream for simple solutions.

Few days ago I needed just that: a simple database with a REST api, and the ability to serve a few static files.

I used jetty embedded and it took was one maven dependency, 3 classes, 150 lines of java, 15 minutes and NO XML.

I was so satisfied with the result work that I served myself a beer and posted the code on github.

The actual project implemented a datastore on neo4j, but for sake of simplicity I released it on github as a dumb in-memory key-value store.

According to apache bench, with as little as 8m memory (-Xmx8m) the server was able to handle 2k concurrent users at a rate of 1500 requests per second.

The main is as simple as this:

public static void main(String[] args) throws Exception {
	Server server = new Server(8080);
	Context root = new Context(server, "/");

	// configure the default servlet to serve static files from "htdocs"
	root.getInitParams().put("org.mortbay.jetty.servlet.Default.resourceBase", "htdocs");
	root.addServlet(new ServletHolder(new DefaultServlet()), "/");

	// use /uuid to get a fresh id
	root.addServlet(new ServletHolder(new UUIDServlet()), "/uuid");

	// the actual key/value store
	root.addServlet(new ServletHolder(new KeyValueServlet()), "/store/*");
	server.start();
}

Get the code: Simple webserver and REST with jetty and NO XML on GitHub

Posted in java | Leave a comment

Most Advanced Yet Acceptable

Raymond Loewy believed that

The adult public’s taste is not necessarily ready to accept the logical solutions to their requirements if the solution implies too vast a departure from what they have been conditioned into accepting as the norm.

Quite a statement. He synthesised  his observation in a principle and further compressed it in acronym: Most Advanced Yet Acceptable – MAYA

He was a radical, forward thinker so I guess he was constantly refitting his ideas into the Acceptable yet Advanced. As a industrial designer, he was trying to have his designs mass produced and sold to actual people. He succeeded and his designs reveal the tension between the future rolling in and the people holding back.

Of course,  it’s common sense. Had he designed something Unacceptable, his biography would have been different.

But MAYA is also a very real constraint, the sort that makes or kills your proposition. No matter what you are selling to whom: a new song, a new recipe, a new software; to your fans, to your girlfriend, to your boss. Being advanced is not enough, it must be acceptable. Not enough advanced, it is not interesting.

You can ignore the principle and be a scientist. Or never challenge it and be an engineer or just be boring. Some would say that a sciengineer will balance his act and live up to the MAYA principle. As a software developer and technology entusiast I find it hard to hit the sweet spot.

Posted in Uncategorized | Tagged , , | Leave a comment

ten principles of good design and two books

I’m watching the first episode of The Genius of Design, a BBC documentary series exploring the history of design. It’s not about software design, but one could argue that the process and the craft of design are interesting not matter what is being designed. Anyway. They briefly interview Dieter Rams, a very prolific designer who made the history of industrial design and came up with ten commandments. Read them slowly. The 10th is pure genius.

  1. Good design is innovative
  2. Good design makes a product useful
  3. Good design is aesthetic
  4. Good design makes a product understandable
  5. Good design is unobtrusive
  6. Good design is honest
  7. Good design is long-lasting
  8. Good design is thorough down to the last detail
  9. Good design is environmentally friendly
  10. Good design is as little design as possible

(We are grateful to Mr Rams for such pearls of wisdom, and so is apple…)

All this talk about design reminds me of  the design of everyday things. It’s a brilliant book, inspiring and foundational. I keep suggesting it, especially to programmers and makers alike. the design of everyday things - coverAnother title worth mentioning is the design of design, from the same folk who wrote ”the mythical man-month”. The new book is not as ground-breaking as the old one, but it’s a good read of its own right.



On the subject of exploring and finding a design, watch this video on ted: build a tower, build a team (6 minutes).


Is there something like imdb or wikipedia for books? I keep linking amazon but I’d prefer to link to a website with a more informational angle.


Posted in Uncategorized | Tagged , | Leave a comment

openid authentication handler for couchdb

Over at couchdb-openid, there is my implementation of OpenID version 1.1 for couchdb, based on http://github.com/etnt/eopenid

It seems fairly stable but has only been tested against myopenid.com, so it is definitely not production ready.

I plan to add support for openid 2.0 and to make couchdb work as openid endpoint.


The handler code would love to be reviewed by someone with some erlang and couchdb experience.

Demo

Try the login page at fortytwo.

Quick install:

  • cd couchdb_install_path/lib/couchdb/erlang/lib/
  • git clone git://github.com/mcaprari/couchdb-openid.git
  • cd couchdb-openid
  • make
  • edit local.ini [httpd]/authentication_handlers (or do it form futon) and
    add {couch_httpd_openid_auth, openid_authentication_handler} BEFORE the default handlers
  • restart couchdb

Quick test:

http://caprazzi.net:5984/_session?openid=auth-request&openid-identifier=<your_openid>

What to expect:

Only openid 1.1 is supported and it has only been tested with myopenid.com as openid provider.

When a client hits the initiation url (above), it is redirected to the openid provider
and prompted to authorise the association.

Then it’s redirected back to the couch and

  • if the client is not logged in in and supplies a new openid, a new user is created with username=openid and the client is logged in
  • if the client is not logged in in and supplies a mapped openid,
    the client is logged in as the mapped user
  • if the client is logged in and supplies a new openid,
    the supplied openid is added to current user, and the client keeps the current login
  • if the client is logged in and supplies a mapped openid
    • if openid is mapped to the same user, the client keeps the current login
    • if openid is mapped to a different user, the operation fails 400
  • if user is logged in AS ADMIN and supplies a new openid the operation fails 500

TODO:

  • try erl_openid for openid 2.0 support
  • decide if it is wise to map openids to admins (if at all possible)
  • cleanup ets table after auth confirm (or maybe find an alternative to ets tables)
  • reduce dependence from eopenid (dict access routines at least)
Posted in Uncategorized | Tagged , | 1 Comment

Couchdb runtime statistics viewer

CouchDB comes with a runtime statistics module that lets you inspect how CouchDB performs.
The statistics module collects metrics like requests per second, request sizes and a multitude of other useful stuff. From Couchdb wiki.

I created a simple app that uses chronoscope to display how some metrics change over time.

Jump to the demo, go to the source or use the docs.

Couchdb Stats Screenshot

Install

couchdb-stats is a simple couchapp plus a python script that hits /_stats and stores the results in couchdb server

Attention: this app is tested with Couchdb 0.11.x, not yet released at the time of this post.

$ git clone git://github.com/mcaprari/couchdb-stats.git
$ cd couchdb-stats
$ couchapp push app http://localhost:5984/stats
$ python stats_copy.py localhost localhost stats 60

Couchdb reports metrics since server start or of the last 1, 5 or 15 minutes. There is no way to see yesterday’s stats. To do that, we need to keep hitting /_stats and store results in a couchdb database.

stats_copy.py does just that.

At this point it’s possible to write several views, each focused on a particular metric. See the documentation for more details.

questions, comments and suggestions welcome

Posted in Uncategorized | Tagged , , , | Leave a comment

generating SVG charts with couchdb

In this article I describe how I got couchdb to produce SVG charts using list functions

This post is long, so I’ll report the results first:


group_level=1
yearly averages
svgpng

group_level=2
monthly averages
svgpng

group_level=3
daily values
svgpng
 

Now go and read how I did it:

  1. generate some test data
  2. upload test data to couchdb
  3. create and manage a design document with couchapp
  4. write a simple view with map/reduce
  5. write a _list function and render the charts!
  6. Conclusions
Apache CouchDB is a document-oriented database server, accessible via a RESTful JSON API. It has some advanced features, such as the ability to write ‘views’ in a map/reduce fashion and to further transform the results using javascript. It’s a young but very promising project.

Try this at home

You can browse browse or download all code discussed here. All comments and corrections are welcome.

Generate some test data

To get started with this exploration we need some data to render, and a quick way to
visualize it before our application is ready. This Python script generates a series of data points
that simulate the goings of someone’s bank account.

# test_data.py. Usage: python test_data.py <simulation_length>
import sys
import random

days = int(sys.argv[1])
savings = 10000
pay = 2000
for i in range(0, days):
	if ( i%30 == 0):
		savings = savings + pay
	savings = savings - random.randint(0, pay/16) - 2
	print i, (int(savings))

Use the script to generate a sample set with 3000 points:

$ python test_data.py 3000 > test_data.txt
$ cat test_data.txt
0 11947
1 11882
2 11813
...

Our final output will be similar to a line chart made with some bash and gnuplot:

#!/bin/sh
# gnuplot.sh generates a plot of a series piped in stdin
(echo -e "set terminal png size 750, 500\nplot \"-\" using 1:2 with lines notitle"
cat -
echo -e "end") | gnuplot
$ cat test_data.txt | sh gnuplot.sh > test_data.png

Upload test data data to couchdb

We need our data in json format so that it can be uploaded to couchdb. This python scripts converts
each input line to a json object. Each object will become a document in couchdb. All lines are collected in the ‘docs’ array, to make the output compatible with couchdb bulk document api. It also adds a tag to each document, so it’s easier to upload and manage multiple datasets.

# data_to_json.py. builds json output suitable for couchdb bulk operations
import sys
import datetime
date = datetime.datetime(2000, 01, 01)
tag = sys.argv[1]
print '{"docs":['
for line in sys.stdin:
	day, value = line.strip().split(' ')
	datestr = (date + datetime.timedelta(int(day))).strftime("%Y-%m-%d")
	if (day <> "0"): print ","
	sys.stdout.write('{"tag":"%s", "date":"%s", "amount":%s}'%(tag, datestr, value)),
print '\n]}',
$ cat test_data.txt | python data_to_json.py test-data > test_data.json
$ cat test_data.json
{"docs":[
{"tag":"test-data", "date":"2000-01-01", "amount":11896},
{"tag":"test-data", "date":"2000-01-02", "amount":11876},
....
{"tag":"test-data", "date":"2008-03-17", "amount":18703},
{"tag":"test-data", "date":"2008-03-18", "amount":18643}
]}

Create a new database with name svg-charts-demo

$ curl -i -X PUT http://localhost:5984/svg-charts-demo/
HTTP/1.1 201 Created
...
{"ok":true}

Upload the test data

$ curl -i -d @test_data.json -X POST http://localhost:5984/svg-charts-demo/_bulk_docs
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
....

Verify that 3000 documents are in the database.

$ curl http://localhost:5984/svg-charts-demo/_all_docs?limit=0
{"total_rows":3000,"offset":3000,"rows":[]}

Create and manage a design document with couchapp

Design documents are special couchdb documents that contain application code such as views and lists.
CouchApp is a set of scripts that makes it easy to create and manage design documents.

In most cases installing couchapp is matter of one command. If you have any problems or want to know more, visit Managing Design Documents on the Definitive Guide.

$ easy_install -U couchapp

This command creates a new couchapp called svg-charts and installs it in couchdb

$ couchapp generate svg-charts

$ ls svg-charts/
_attachments  _id  couchapp.json  lists  shows  updates  vendor  views

$ couchapp push svg-charts http://localhost:5984/svg-charts-demo/
[INFO] Visit your CouchApp here:

http://localhost:5984/svg-charts-demo/_design/svg-charts/index.html

Write a simple view with map/reduce

This view will enable us to group the test data year, month or day and see the average
for each group.

// map.js
// key is array representing a date [year][month][day]
// value is each doc amount field (a number)
function(doc) {
	// dates are stored in the doc as 'yyyy-mm-dd'
	emit(doc.date.split('-'), doc.amount);
}
// reduce.js
// this reduce function returns an array of objects
// {tot:total_value_for_group, count:elements_in_the_group}
// clients can than do tot/count to get the average for the group
// Keys are arrays [year][month][day], so count will always be 1 when group_level=3
function(keys, values, rereduce) {
	if (rereduce) {
		var result = {tot:0, count:0};
		for (var idx in values) {
			result.tot += values[idx].tot;
			result.count += values[idx].count;
		}
		return result;
	}
	else {
		var result = {tot:sum(values), count:values.length};
		return result;
	}
}

Update the design document and test the different groupings

$ couchapp push svg-charts http://localhost:5984/svg-charts-demo/

Call the view with group_level=1 to get the data grouped by year

$ curl http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=1
{"rows":[
{"key":["2000"],"value":{"tot":4247068,"count":366}},
...
{"key":["2008"],"value":{"tot":1529286,"count":78}}
]}

Call the view with roup_level=2 to get the data grouped by month

$ curl http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=2
{"rows":[
{"key":["2000","01"],"value":{"tot":343578,"count":31}},
{"key":["2000","06"],"value":{"tot":345282,"count":30}},
...

Call the view with roup_level=3 to get the data grouped by day. As all the keys are different at the third level, this returns a single row for each document.

$ curl -s http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=3
{"rows":[
{"key":["2000","01","01"],"value":{"tot":11896,"count":1}},
{"key":["2000","01","04"],"value":{"tot":11747,"count":1}},
...

Same as above but limiting the response to a range of days

$ curl -s 'http://localhost:5984/svg-charts-demo/_design/svg-charts/_view/by_date?group_level=3
&startkey=\["2008","01","01"\]&endkey=\["2008","01","04"\]'
{"rows":[
{"key":["2008","01","01"],"value":{"tot":20050,"count":1}},
{"key":["2008","01","02"],"value":{"tot":20019,"count":1}},
{"key":["2008","01","03"],"value":{"tot":19974,"count":1}},
{"key":["2008","01","04"],"value":{"tot":19878,"count":1}}
]}

Write a _list function and render the charts!

function(head, req) {
	start({"headers":{"Content-Type" : "image/svg+xml"}});

	// some utility functions that print svg elements
	function svg(width, height) {
		return '<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"'+
		' style="fill:black"'+
		' width="'+width+'" height="'+height+'">\n';
	}
	function line(x1, y1, x2, y2, color) {
		return '<line x1="'+x1+'" y1="'+y1+'" x2="'+x2+'" y2="'+y2+'"
			style="stroke-width: 0.2; stroke:'+color+'"/>\n';
	}
	function rect(x, y, width, height, color, fill) {
		return '<rect x="'+x+'" y="'+y+'" width="'+width+'" height="'+height+'"
			style="fill:'+fill+'; stroke:'+color+'"/>\n';
	}
	function text(x,y, text) {
		return '<text x="'+x+'" y="'+y+'" font-size="11"
			font-family="sans-serif">'+text+'</text>\n';
	}

	// import query parameters
	var x_size = req.query.width || 750;
	var y_size = req.query.height || 500;
	var level = parseInt(req.query.group_level);

	// find max and min values
	// collect values and labels
	var y_max = null;
	var y_min = null;
	var values = [];
	var labels = [];
	var count = 0;
	while(row = getRow()) {
		var value = Math.ceil(row.value.tot/row.value.count);
		if (y_max==null || value>y_max) { y_max=value; }
		if (y_min==null || value<y_min) { y_min=value; }
		values[count] = value;
		labels[count] = row.key.join('-');
		count++;
	}
	// calculate scalig factors
	var in_width = x_size-(2*pad);
	var in_height = y_size-(2*pad);
	var in_x_scale = in_width/count;
	var in_y_scale = in_height/(y_max-y_min);

	// free space surrounding the actual chart
	var pad = Math.round(y_size/12);

	send('<?xml version="1.0"?>');
	send(svg(x_size, y_size));

	// background box
	send(rect(1,1, x_size, y_size, '#C6F1C7', '#C6F1C7'));

	// chart container box
	send(rect(pad,pad, x_size-(2*pad), y_size-(2*pad), 'black','white'));

	// draw labels and grid
	var y_base = y_size - pad;
	var lastx = 0;
	var lasty = 0;
	for(var i=0; i<count; i++) {
		var x = pad+Math.round(i*in_x_scale);
		if (i==0 || x-lastx > (30+12*level)) {
			send(line(x, y_base+(pad/2), x, pad,'gray'));
			send(text(x+3, y_base + (pad/2), labels[i]));
			lastx = x;
		}
		var y = Math.round(y_base - ( (values[i]-y_min) * in_y_scale));
		if (i==0 || lasty-y > 15) {
			send(line(5, y, pad+in_width, y,'gray'));
			send(text(5, y-2, values[i]));
			lasty = y;
		}
	}
	// draw the actual chart
	send('<polyline style="stroke:black; stroke-width: '+ (4-level) +'; fill: none;" points="');
	for(var i=0; i<count; i++) {
		if (i>0) send(',\n');
		var x = pad+Math.round(i*in_x_scale);
		var y = Math.round(y_base - ( (values[i]-y_min) * in_y_scale));
		send( x + ' ' + y);
	}
	send('"/>');

	send('</svg>');
}


Update couchapp, and execute the list function ‘chart-line’ against the view ‘by_date’.
Use different group_level settings, to obtain different charts:

curl http://localhost:5984/svg-charts-demo/_design/svg-charts/\
_list/chart-line/by_date?group_level=3 > chart-line_level-3.svg

curl http://localhost:5984/svg-charts-demo/_design/\
_list/chart-line/by_date?group_level=2 > chart-line_level-2.svg

curl http://localhost:5984/svg-charts-demo/_design/\
_list/chart-line/by_date?group_level=1 > chart-line_level-1.svg

group_level=1
yearly averages
svgpng

group_level=2
monthly averages
svgpng

group_level=3
daily values
svgpng
 

Concusions

It worked.

I didn’t expect to use a single list function for all grouping levels. I’m particularly happy of how it worked out, and even more considering
that the whole thing is about 100 lines of code.

The output isn’t too nice, but I think I can be made presentable with under 500 lines of code and some effort.

Couchdb is always a pleasure to work with and it goas a long way in minimizing “Time To something Done”.

Posted in Uncategorized | 8 Comments