HAML to PHP compiler


Ever since last’s years ruby-on-rails project, I love the simplicity and beauty of Haml.
So a couple days ago, I decided to see if there aren’t any implementations for PHP, and maybe even a WordPress plugin for theming in HAML.
Guess what, there are.

Of course, I did what any sane CS person would do and submitted all the implementations I could find to some rigorous testing.
This post is about the results of that testing.
Two things I was most interested in:

  1. Completeness: in layman’s terms, how good the parser is at its job. Testing if it parses and also if it compiles with PHP, and
  2. Speed: how fast they are (both startup and parsing speed).

I found and tested 6 implementations:

array('haml2php', 'mthaml', 'fammel', 'connec_phphaml', 'phamlp', 'phphaml', 

For testing I used all the different .haml files that the parsers themselves came with (124 in all), I also added a couple myself to test some specific things.
I had to rewrite some of them a little, because they used ruby code, and for this we needed php code.

I also did some changes to the parsers themselves, mostly cosmetical though, like throwing them into their own namespace (so they’d play well together, since they tend to have the same classnames – think Parser, Haml) or changing some accessor modifiers (so I could call the parse functions from the outside).

There are two main schools of compiler-writing present in this field.
Fammel is the only one going the traditional way of having a grammar. It uses lime to turn that into a parser and lexer. One problem with that approach in connection with SASS is that Sass is not completely context-free. Happily, fammel only compiles Haml. It is still far from complete, though. The fact that fammel sports a decent result of 75% success rate is because half of the test .haml files came with fammel (so it knows how to parse them).
Every other compiler uses RegExes in some fashion or other.

I also found quite a few of the .haml files not to be correct haml files (read syntax errors), so I threw them into a special folder called ‘invalid’ which signals to the testing code that those should not parse. Some of the parsers do, however, parse the erroneous scripts, which I then don’t count towards the overall result.
Haml itself is very good in that it even parses the contained ruby code in the template, and throws an error if its not syntactically correct, which sadly also prevents us from running the same templates trough haml for comparison. All the PHP compilers do not parse PHP, so if we want to test the compiled .php files for syntactical correctness we have to save them somewhere and run them thru ‘php -l’. Trying to include or eval files with parse errors in them, throws a Fatal error which cannot be caught and terminates the script.

Without any further ado, here are the results of all 124 templates:

Name			#compiles/#parses/#count	        	% 
--->haml2php:		    87   /  89   /  124	              70.16%
--->fammel:		    90   /  94   /  124	              72.85% 
--->mthaml:		    92   /  100  /  124	              74.19% 
--->connec_phphaml:	    93   /  97   /  124	              75% 
--->phamlp:		    108  /  113  /  124	              87.1% 
--->phphaml:		    116  /  122  /  124	              93.55% 
--->baldrs_phphaml:	    116  /  122  /  124	              93.55%

Of course, the fact that it parses and compiles with PHP says nothing about correctness, it might be syntactically correct but still be logically incorrect.
PHPHaml is by far the best parser in the field, being able to correctly translate almost 94% of the templates.
It does have its faults, though: Haml templates can be indented by any amount of spaces as long as they remain consistent, or tabs. PhpHaml only allows 2 spaces, nothing more. Furthermore Code insertion (following an euqals sign) does not need a space in Haml, ie. the code can follow immediately after the =. In PhpHaml that breaks. Filter handling broke due to the subclassing that was done downstream, but that is easily fixable. Speaking about filters: calling an nonexistent filter in Haml throws an error, in phphaml it does not. And lastly, whitespace handling (eg. in :preserve) as well as variable interpolation with filters does not work correctly in phphaml.

Now for the timing tests:

--------------------Results (/μs)---------------------
Name			startup(cold/hot) min	  avg    (of#)	max
--->haml2php:	        25   / 15	  2421	10784.18 (80)	91853
--->mthaml:		351  / 24	  819	6682.53  (92)	87258
--->fammel:		253  / 56	  553	6015.93  (86)	65379
--->phamlp:		76   / 33	  189	2503.01  (109)	33780
--->connec_phphaml:	26   / 20	  260	1617.51  (89)	26822
--->phphaml:		5360 / 62	  692	3929.44  (119)	61683
--->baldrs_phphaml:	3761 / 57	  724	4514.69  (119)	69345

All times are in Micro- (10^-6) seconds.
We can see why PhpHaml is so good, a cold startup takes one order of magnitude longer than everyone else, hot startup is still slowest but better. Parsing itself places it squarely in the middle of the field, in minimum and average [parsing times] it takes third place, whereas in maximum it even makes it to second place.
As an update, I added Baldrs fork of phphaml as baldrs_phphaml. They included some more patches which makes HTML-style attributes work now, but doesn’t increase the compile count, because even though phphamldid compile it, it wasn’t logically correct.
As a downside, the changes makes it slightly slower, sadly!

For anyone interested, here is the complete source, which includes all tested compilers, templates and my testing code. Enjoy!


Issues with my Web 4.0 design


So, I have been thinking some more about my design and been reading some interesting papers the nice people at UCI gave me when I visited the other day to seed my literature review. I realized, there is a huge issue which I have overlooked so far: delay and asynchronicity.

On a side note, it seems that neither links nor Luna concern themselves much with this particular problem, but it has been brought to the attention of one of the developers of Links after a talk he recently gave, cf. here.

Read the rest of this entry »



The other day I was attending a talk by UCSD Professor Charles Elkan about the Netflix competition, who was incidentally one of two external judges of it, which was really insightful and started getting me interested in Data Mining. So after thinking about storage requirements for my pet-‘Web 4.0 monolithic web-application’ I decided to dig deeper into DBMS technology vs. filesystems. I learned about OODBMS along the way, so we have to cover those briefly as well.

Let’s start with DBMS. They were created to handle large sets of data efficiently in the face of lots of concurrent reads and writes – read users – as well as allow swift querying of that data. They evolved from a mathematical sound theory – first order predicate logic – into what is known as relational algebra. To summarize, DBMS provide the following things:

  • Correctness under concurrent access – known as the ACID principles,
  • Complete indices for the data which together with
  • a querying language based on a mathematically closed model allows for efficient searching aka. a powerful query execution engine.

Read the rest of this entry »

Future of the web happening now!


As it turns out, I am not the only one to notice (obviously) the apparent complexity of writing large-scale applications for the web today. Moreover I am also not the only one trying to do something about it. There are at least 3 different projects that I got aware of recently.

  1. The first is an eclipse project called Rich Ajax Platform (RAP) that was born out of RCP and uses the same model (OSGi etc.): http://eclipsesource.com/en/eclipse/eclipse-rap/. It can compile both a stand-alone desktop application as well as web-application that look similar from the same source. I think it looks promising and is a very good first step.
  2. Both the second and the third take a completely different approach in that the idea is to write everything in a functional language. I learned about Luna a couple of days ago, when I got a link on twitter about it: http://asana.com/luna. It has a C-style syntax but allows you to write everything in a monolithic way and pretend you have full access to say, the DB in a template. It supports JS-escaping similar to asm{} blocks in C, as well as XML and CSS as top-level constructs in the language syntax. I have to say that it looks very cool, and almost too easy to write a web-application that way (look at the example on their page, to see what I mean).
  3. The third is almost the same idea but from a research lab in the UK, started about 5 years ago, called links: http://groups.inf.ed.ac.uk/links/. I found that link in the comments on the luna blog 😉 The syntax is a little more functional but the underlying idea is the same, and has some of the same features, ie. XML top-level construct, monolithic single-sourcing etc. It is written on top of Caml.

The next-generation Web or Web 4.0


Last night I was thinking about the future of the web (again) and what points I might have missed during my SOFEA series. In this post I am going to fill those gaps.

In the traditional client/server paradigm, the standard most of the time only defines a ‘protocol’ and how the software (both client and server) behaves externally, also called the side-effects, but not how it does it internally, or how to look while doing it. This applies to most if not all Internet standards so far, and for HTTP as well, though HTML does define the look of the static content (not of the browser).

The current browser model is almost 20 years old by now and based on the traditional client/server paradigm with only static content. This was fine and dandy back then since machines were not very powerful (even servers) and runtime compilation/interpretation wasn’t even invented yet, but after 20 years of Moore’s law today’s cell phones are more powerful than a room full of hardware in those days and I think it is time to rethink that model. Read the rest of this entry »