HAML to PHP compiler


Ever since last’s years ruby-on-rails project, I love the simplicity and beauty of Haml.
So a couple days ago, I decided to see if there aren’t any implementations for PHP, and maybe even a WordPress plugin for theming in HAML.
Guess what, there are.

Of course, I did what any sane CS person would do and submitted all the implementations I could find to some rigorous testing.
This post is about the results of that testing.
Two things I was most interested in:

  1. Completeness: in layman’s terms, how good the parser is at its job. Testing if it parses and also if it compiles with PHP, and
  2. Speed: how fast they are (both startup and parsing speed).

I found and tested 6 implementations:

array('haml2php', 'mthaml', 'fammel', 'connec_phphaml', 'phamlp', 'phphaml', 

For testing I used all the different .haml files that the parsers themselves came with (124 in all), I also added a couple myself to test some specific things.
I had to rewrite some of them a little, because they used ruby code, and for this we needed php code.

I also did some changes to the parsers themselves, mostly cosmetical though, like throwing them into their own namespace (so they’d play well together, since they tend to have the same classnames – think Parser, Haml) or changing some accessor modifiers (so I could call the parse functions from the outside).

There are two main schools of compiler-writing present in this field.
Fammel is the only one going the traditional way of having a grammar. It uses lime to turn that into a parser and lexer. One problem with that approach in connection with SASS is that Sass is not completely context-free. Happily, fammel only compiles Haml. It is still far from complete, though. The fact that fammel sports a decent result of 75% success rate is because half of the test .haml files came with fammel (so it knows how to parse them).
Every other compiler uses RegExes in some fashion or other.

I also found quite a few of the .haml files not to be correct haml files (read syntax errors), so I threw them into a special folder called ‘invalid’ which signals to the testing code that those should not parse. Some of the parsers do, however, parse the erroneous scripts, which I then don’t count towards the overall result.
Haml itself is very good in that it even parses the contained ruby code in the template, and throws an error if its not syntactically correct, which sadly also prevents us from running the same templates trough haml for comparison. All the PHP compilers do not parse PHP, so if we want to test the compiled .php files for syntactical correctness we have to save them somewhere and run them thru ‘php -l’. Trying to include or eval files with parse errors in them, throws a Fatal error which cannot be caught and terminates the script.

Without any further ado, here are the results of all 124 templates:

Name			#compiles/#parses/#count	        	% 
--->haml2php:		    87   /  89   /  124	              70.16%
--->fammel:		    90   /  94   /  124	              72.85% 
--->mthaml:		    92   /  100  /  124	              74.19% 
--->connec_phphaml:	    93   /  97   /  124	              75% 
--->phamlp:		    108  /  113  /  124	              87.1% 
--->phphaml:		    116  /  122  /  124	              93.55% 
--->baldrs_phphaml:	    116  /  122  /  124	              93.55%

Of course, the fact that it parses and compiles with PHP says nothing about correctness, it might be syntactically correct but still be logically incorrect.
PHPHaml is by far the best parser in the field, being able to correctly translate almost 94% of the templates.
It does have its faults, though: Haml templates can be indented by any amount of spaces as long as they remain consistent, or tabs. PhpHaml only allows 2 spaces, nothing more. Furthermore Code insertion (following an euqals sign) does not need a space in Haml, ie. the code can follow immediately after the =. In PhpHaml that breaks. Filter handling broke due to the subclassing that was done downstream, but that is easily fixable. Speaking about filters: calling an nonexistent filter in Haml throws an error, in phphaml it does not. And lastly, whitespace handling (eg. in :preserve) as well as variable interpolation with filters does not work correctly in phphaml.

Now for the timing tests:

--------------------Results (/μs)---------------------
Name			startup(cold/hot) min	  avg    (of#)	max
--->haml2php:	        25   / 15	  2421	10784.18 (80)	91853
--->mthaml:		351  / 24	  819	6682.53  (92)	87258
--->fammel:		253  / 56	  553	6015.93  (86)	65379
--->phamlp:		76   / 33	  189	2503.01  (109)	33780
--->connec_phphaml:	26   / 20	  260	1617.51  (89)	26822
--->phphaml:		5360 / 62	  692	3929.44  (119)	61683
--->baldrs_phphaml:	3761 / 57	  724	4514.69  (119)	69345

All times are in Micro- (10^-6) seconds.
We can see why PhpHaml is so good, a cold startup takes one order of magnitude longer than everyone else, hot startup is still slowest but better. Parsing itself places it squarely in the middle of the field, in minimum and average [parsing times] it takes third place, whereas in maximum it even makes it to second place.
As an update, I added Baldrs fork of phphaml as baldrs_phphaml. They included some more patches which makes HTML-style attributes work now, but doesn’t increase the compile count, because even though phphamldid compile it, it wasn’t logically correct.
As a downside, the changes makes it slightly slower, sadly!

For anyone interested, here is the complete source, which includes all tested compilers, templates and my testing code. Enjoy!


How to write a decent pref-manager for mozilla extensions


Today, I want to share with you my experiences in writing code to use mozilla’s pref system, that is if you write an add-on for any of mozilla’s apps (firefox, thunderbird…).
After I got my feet wet with the Thundersomething addon a while back, I recently gained some more experience rewriting TBTracer’s pref system from scratch. Thundersomething started as an adaption of Firesomething to Thunderbird (TB), which came with a pretty decent pref system meaning I didn’t need to change a lot.
TBT on the other hand, started out with flashtracer as a template, and without wanting to belittle flastracer or it’s author, the pref code sucked.
That became more and more apparent, the bigger TBT became and consequently the more options it had, up to the point where I had to rethink the pref manager code and rewrite it. Read on to see my implementation details…
Read the rest of this entry »

Mozilla’s add-on review process and evil eval


A little while ago, I submitted a new version of TBTracer (0.5.1) to addons.mozilla.org. To get through the review process I had to change my javascript data formatting and parsing code, because it uses eval – which is apparently evil 😉

As I had already mentioned in the last post, I found three different scripts for date formatting and profiled them (if you haven’t read it, do that first). Finally I went with the one that uses eval, because it is the fastest. Here’s why:

Read the rest of this entry »

Javascript Date formatting


Today I was working on my TBtracer plugin, which is humming along nicely BTW, some of the new features recently included  are:

  • full response for HTTP requests in log,
  • CSS rules for both HTTP head and body,
  • better organization of high-resolution timestamps in conjunction with HTTP lines (timestamps for both request and response are recorded at the exact moment the notification first hits my code),
  • able to select which columns are shown in log,
  • custom date format string for timestamps.

While working on that, I needed date formatting for the last bullet point which is not standard in javascript. A quick google search reveals lots of hits and I quickly settled on 3 proimising looking candidates (meaning the code looked clean :). So let’s take a look.
Read the rest of this entry »

Processing.js Seismometer Dashboard widget


I just finished the promised 2d Seismometer Dashboard widget that uses processing.js. It is based on the excellent, albeit simple, Seismometer widget from Matt Haynes and uses its Unimotion plugin library as well.
Somebody has been asking for a version that graphs all 3 axis in the comments, so here it is!
One could use 3 versions of that seismomenter widget, and choose a different axis for each one, but sadly it does not save the prefs, and so forgets that after a restart, resetting to all 3 graphing the same axis.

My widget is called P5Seismo (from processing’s early name P5) and comes in 3 flavors, without and with two different frame borders – a simple and a larger one.

Here is a screenshot of the frameless version, showing 3 parallel lines:

Read the rest of this entry »



About a year-and-a-half ago I was toying with Processing, which is sort of like a scripting language for graphical applications – for those of you who don’t know, and wrote some plugins for it. I started tinkering with it because I was interested in doing something interesting with the SuddenMotionSensor (SMS) that’s built into every Apple Laptop and thought that Processing will make cool graphs of the readings easier – which it did.

I found that the SMS input plugin that was on the site did not work according to my standards, so I wrote one that uses SMSlib from Seismac rather than Unimotion. Using that I built the sketch that I had wanted to from the beginning which graphs the readings in 3d:  SMS sketch

Read the rest of this entry »

On scripting


About one-and-a-half years ago (wow – it’s been that long?) I was working on a semester project for the institute of signal communication at the university of Braunschweig, Germany (http://www.ifn.ing.tu-bs.de/) about scripting languages. While the details of the work do not matter, the more interesting part of this work is my evaluation of high-level versus scripting (dynamic) languages.

My basic point is that dynamic languages are easier and faster to program for than non-dynamic languages like C or Java, because they don’t concern the developer as much with syntax (as in variable types for example) or debugging. In essence, dynamic languages move effort (computation time) from the developer towards the compiler, because it has to spend more time inferring what the developer meant. In dynamic languages, the whole program runs inside a sandbox called a virtual machine, creating even more effort on the CPUs side which also helps the developer spend less time worrying about mundane things like memory management. Also, virtual machines help with debugging a lot because they provide a better idea of what went wrong, and can provide a generally better insight into the running program.

In essence, I think that dynamic languages are the future of programming because as CPUs get faster and faster this shift of effort will mean that development time in scripting languages goes down more rapidly as in other languages.

The paper and the defense can both be found in my slideshare slides here, beware they are both in German: http://www.slideshare.net/derDoc.

Update: I also uploaded my Master and at the same time Diploma-thesis as well as the defenses for it in both German and English to my slideshare account. As well as a poster for NSDI, all of which were made at UTEP, Texas, USA.

Reblog this post [with Zemanta]