HAML to PHP compiler


Ever since last’s years ruby-on-rails project, I love the simplicity and beauty of Haml.
So a couple days ago, I decided to see if there aren’t any implementations for PHP, and maybe even a WordPress plugin for theming in HAML.
Guess what, there are.

Of course, I did what any sane CS person would do and submitted all the implementations I could find to some rigorous testing.
This post is about the results of that testing.
Two things I was most interested in:

  1. Completeness: in layman’s terms, how good the parser is at its job. Testing if it parses and also if it compiles with PHP, and
  2. Speed: how fast they are (both startup and parsing speed).

I found and tested 6 implementations:

array('haml2php', 'mthaml', 'fammel', 'connec_phphaml', 'phamlp', 'phphaml', 

For testing I used all the different .haml files that the parsers themselves came with (124 in all), I also added a couple myself to test some specific things.
I had to rewrite some of them a little, because they used ruby code, and for this we needed php code.

I also did some changes to the parsers themselves, mostly cosmetical though, like throwing them into their own namespace (so they’d play well together, since they tend to have the same classnames – think Parser, Haml) or changing some accessor modifiers (so I could call the parse functions from the outside).

There are two main schools of compiler-writing present in this field.
Fammel is the only one going the traditional way of having a grammar. It uses lime to turn that into a parser and lexer. One problem with that approach in connection with SASS is that Sass is not completely context-free. Happily, fammel only compiles Haml. It is still far from complete, though. The fact that fammel sports a decent result of 75% success rate is because half of the test .haml files came with fammel (so it knows how to parse them).
Every other compiler uses RegExes in some fashion or other.

I also found quite a few of the .haml files not to be correct haml files (read syntax errors), so I threw them into a special folder called ‘invalid’ which signals to the testing code that those should not parse. Some of the parsers do, however, parse the erroneous scripts, which I then don’t count towards the overall result.
Haml itself is very good in that it even parses the contained ruby code in the template, and throws an error if its not syntactically correct, which sadly also prevents us from running the same templates trough haml for comparison. All the PHP compilers do not parse PHP, so if we want to test the compiled .php files for syntactical correctness we have to save them somewhere and run them thru ‘php -l’. Trying to include or eval files with parse errors in them, throws a Fatal error which cannot be caught and terminates the script.

Without any further ado, here are the results of all 124 templates:

Name			#compiles/#parses/#count	        	% 
--->haml2php:		    87   /  89   /  124	              70.16%
--->fammel:		    90   /  94   /  124	              72.85% 
--->mthaml:		    92   /  100  /  124	              74.19% 
--->connec_phphaml:	    93   /  97   /  124	              75% 
--->phamlp:		    108  /  113  /  124	              87.1% 
--->phphaml:		    116  /  122  /  124	              93.55% 
--->baldrs_phphaml:	    116  /  122  /  124	              93.55%

Of course, the fact that it parses and compiles with PHP says nothing about correctness, it might be syntactically correct but still be logically incorrect.
PHPHaml is by far the best parser in the field, being able to correctly translate almost 94% of the templates.
It does have its faults, though: Haml templates can be indented by any amount of spaces as long as they remain consistent, or tabs. PhpHaml only allows 2 spaces, nothing more. Furthermore Code insertion (following an euqals sign) does not need a space in Haml, ie. the code can follow immediately after the =. In PhpHaml that breaks. Filter handling broke due to the subclassing that was done downstream, but that is easily fixable. Speaking about filters: calling an nonexistent filter in Haml throws an error, in phphaml it does not. And lastly, whitespace handling (eg. in :preserve) as well as variable interpolation with filters does not work correctly in phphaml.

Now for the timing tests:

--------------------Results (/μs)---------------------
Name			startup(cold/hot) min	  avg    (of#)	max
--->haml2php:	        25   / 15	  2421	10784.18 (80)	91853
--->mthaml:		351  / 24	  819	6682.53  (92)	87258
--->fammel:		253  / 56	  553	6015.93  (86)	65379
--->phamlp:		76   / 33	  189	2503.01  (109)	33780
--->connec_phphaml:	26   / 20	  260	1617.51  (89)	26822
--->phphaml:		5360 / 62	  692	3929.44  (119)	61683
--->baldrs_phphaml:	3761 / 57	  724	4514.69  (119)	69345

All times are in Micro- (10^-6) seconds.
We can see why PhpHaml is so good, a cold startup takes one order of magnitude longer than everyone else, hot startup is still slowest but better. Parsing itself places it squarely in the middle of the field, in minimum and average [parsing times] it takes third place, whereas in maximum it even makes it to second place.
As an update, I added Baldrs fork of phphaml as baldrs_phphaml. They included some more patches which makes HTML-style attributes work now, but doesn’t increase the compile count, because even though phphamldid compile it, it wasn’t logically correct.
As a downside, the changes makes it slightly slower, sadly!

For anyone interested, here is the complete source, which includes all tested compilers, templates and my testing code. Enjoy!

How to write a decent pref-manager for mozilla extensions


Today, I want to share with you my experiences in writing code to use mozilla’s pref system, that is if you write an add-on for any of mozilla’s apps (firefox, thunderbird…).
After I got my feet wet with the Thundersomething addon a while back, I recently gained some more experience rewriting TBTracer’s pref system from scratch. Thundersomething started as an adaption of Firesomething to Thunderbird (TB), which came with a pretty decent pref system meaning I didn’t need to change a lot.
TBT on the other hand, started out with flashtracer as a template, and without wanting to belittle flastracer or it’s author, the pref code sucked.
That became more and more apparent, the bigger TBT became and consequently the more options it had, up to the point where I had to rethink the pref manager code and rewrite it. Read on to see my implementation details…
Read the rest of this entry »

Mozilla’s add-on review process and evil eval


A little while ago, I submitted a new version of TBTracer (0.5.1) to addons.mozilla.org. To get through the review process I had to change my javascript data formatting and parsing code, because it uses eval – which is apparently evil 😉

As I had already mentioned in the last post, I found three different scripts for date formatting and profiled them (if you haven’t read it, do that first). Finally I went with the one that uses eval, because it is the fastest. Here’s why:

Read the rest of this entry »

Javascript Date formatting


Today I was working on my TBtracer plugin, which is humming along nicely BTW, some of the new features recently included  are:

  • full response for HTTP requests in log,
  • CSS rules for both HTTP head and body,
  • better organization of high-resolution timestamps in conjunction with HTTP lines (timestamps for both request and response are recorded at the exact moment the notification first hits my code),
  • able to select which columns are shown in log,
  • custom date format string for timestamps.

While working on that, I needed date formatting for the last bullet point which is not standard in javascript. A quick google search reveals lots of hits and I quickly settled on 3 proimising looking candidates (meaning the code looked clean :). So let’s take a look.
Read the rest of this entry »

Issues with my Web 4.0 design


So, I have been thinking some more about my design and been reading some interesting papers the nice people at UCI gave me when I visited the other day to seed my literature review. I realized, there is a huge issue which I have overlooked so far: delay and asynchronicity.

On a side note, it seems that neither links nor Luna concern themselves much with this particular problem, but it has been brought to the attention of one of the developers of Links after a talk he recently gave, cf. here.

Read the rest of this entry »



The other day I was attending a talk by UCSD Professor Charles Elkan about the Netflix competition, who was incidentally one of two external judges of it, which was really insightful and started getting me interested in Data Mining. So after thinking about storage requirements for my pet-‘Web 4.0 monolithic web-application’ I decided to dig deeper into DBMS technology vs. filesystems. I learned about OODBMS along the way, so we have to cover those briefly as well.

Let’s start with DBMS. They were created to handle large sets of data efficiently in the face of lots of concurrent reads and writes – read users – as well as allow swift querying of that data. They evolved from a mathematical sound theory – first order predicate logic – into what is known as relational algebra. To summarize, DBMS provide the following things:

  • Correctness under concurrent access – known as the ACID principles,
  • Complete indices for the data which together with
  • a querying language based on a mathematically closed model allows for efficient searching aka. a powerful query execution engine.

Read the rest of this entry »

Future of the web happening now!


As it turns out, I am not the only one to notice (obviously) the apparent complexity of writing large-scale applications for the web today. Moreover I am also not the only one trying to do something about it. There are at least 3 different projects that I got aware of recently.

  1. The first is an eclipse project called Rich Ajax Platform (RAP) that was born out of RCP and uses the same model (OSGi etc.): http://eclipsesource.com/en/eclipse/eclipse-rap/. It can compile both a stand-alone desktop application as well as web-application that look similar from the same source. I think it looks promising and is a very good first step.
  2. Both the second and the third take a completely different approach in that the idea is to write everything in a functional language. I learned about Luna a couple of days ago, when I got a link on twitter about it: http://asana.com/luna. It has a C-style syntax but allows you to write everything in a monolithic way and pretend you have full access to say, the DB in a template. It supports JS-escaping similar to asm{} blocks in C, as well as XML and CSS as top-level constructs in the language syntax. I have to say that it looks very cool, and almost too easy to write a web-application that way (look at the example on their page, to see what I mean).
  3. The third is almost the same idea but from a research lab in the UK, started about 5 years ago, called links: http://groups.inf.ed.ac.uk/links/. I found that link in the comments on the luna blog 😉 The syntax is a little more functional but the underlying idea is the same, and has some of the same features, ie. XML top-level construct, monolithic single-sourcing etc. It is written on top of Caml.

Processing.js Seismometer Dashboard widget


I just finished the promised 2d Seismometer Dashboard widget that uses processing.js. It is based on the excellent, albeit simple, Seismometer widget from Matt Haynes and uses its Unimotion plugin library as well.
Somebody has been asking for a version that graphs all 3 axis in the comments, so here it is!
One could use 3 versions of that seismomenter widget, and choose a different axis for each one, but sadly it does not save the prefs, and so forgets that after a restart, resetting to all 3 graphing the same axis.

My widget is called P5Seismo (from processing’s early name P5) and comes in 3 flavors, without and with two different frame borders – a simple and a larger one.

Here is a screenshot of the frameless version, showing 3 parallel lines:

Read the rest of this entry »



About a year-and-a-half ago I was toying with Processing, which is sort of like a scripting language for graphical applications – for those of you who don’t know, and wrote some plugins for it. I started tinkering with it because I was interested in doing something interesting with the SuddenMotionSensor (SMS) that’s built into every Apple Laptop and thought that Processing will make cool graphs of the readings easier – which it did.

I found that the SMS input plugin that was on the site did not work according to my standards, so I wrote one that uses SMSlib from Seismac rather than Unimotion. Using that I built the sketch that I had wanted to from the beginning which graphs the readings in 3d:  SMS sketch

Read the rest of this entry »

The next-generation Web or Web 4.0


Last night I was thinking about the future of the web (again) and what points I might have missed during my SOFEA series. In this post I am going to fill those gaps.

In the traditional client/server paradigm, the standard most of the time only defines a ‘protocol’ and how the software (both client and server) behaves externally, also called the side-effects, but not how it does it internally, or how to look while doing it. This applies to most if not all Internet standards so far, and for HTTP as well, though HTML does define the look of the static content (not of the browser).

The current browser model is almost 20 years old by now and based on the traditional client/server paradigm with only static content. This was fine and dandy back then since machines were not very powerful (even servers) and runtime compilation/interpretation wasn’t even invented yet, but after 20 years of Moore’s law today’s cell phones are more powerful than a room full of hardware in those days and I think it is time to rethink that model. Read the rest of this entry »

StudiVZ plugin for Adium issues

studiVZ screenshot
Image by momentimedia via Flickr

So I just finished the first version of my StudiVZ plugin for AdiumX. It uses the XMPP<->SVZ gateway run by nimbuzz.com so an account for nimbuzz is needed as well to use the plugin. It can be found on the Adium Xtras page under plugins once it is cleared/reviewed.

While programming that I ran into a couple of interesting issues: Read the rest of this entry »

Conclusion of the SOFEA series: Future of web-development/RIA


To summarize this series, I will shine the light on some of the more important points from the series.

I really believe, that at some point in the future, there will be an unified web-development environment (UWE) for the web which allow the developing of a whole, single application in one language which will then get broken up and compiled into the client and server parts by the compiler. So essentially the compiler decides where to draw the line between client and server – possibly with some help from the developer in the form of annotations, some description file or specific classes.

With this it is entirely possible that the program can be compiled into different representations, i.e. different languages for different deployments – for example a part for deployment on a server running struts and another one running ruby on rails, and client parts for a very thin – read restricted – mobile client as well as a very rich/powerful client. This naturally moves the break-up line between client and server which is why the compiler needs to be able to decide that at compile time.

What language the unified application is written in, does ultimately not matter since all languages that are Turing-complete can be crosscompiled. But since it is more complicated to compile from a non-VM language into a program that can be run in VM, it should be a VM language, even if it is one with strong typing like Java. The GWT is a very good step in this direction, and it shows that it is possible to write the client-side part in a different language (in this case Java, which is close enough to JS to make that easy) for easier development and debugging and then cross-compile it for running on the client.

I think that fusing GWT with a similar toolkit for the server-side, maybe based on struts or some other Java webserver and implementing a generator for the marshaling, can be an important first implementation of this unified web-development environment or toolkit. If somebody is interested in this, maybe it can be suggested as a project for the google summer of Code (SoC).

On scripting


About one-and-a-half years ago (wow – it’s been that long?) I was working on a semester project for the institute of signal communication at the university of Braunschweig, Germany (http://www.ifn.ing.tu-bs.de/) about scripting languages. While the details of the work do not matter, the more interesting part of this work is my evaluation of high-level versus scripting (dynamic) languages.

My basic point is that dynamic languages are easier and faster to program for than non-dynamic languages like C or Java, because they don’t concern the developer as much with syntax (as in variable types for example) or debugging. In essence, dynamic languages move effort (computation time) from the developer towards the compiler, because it has to spend more time inferring what the developer meant. In dynamic languages, the whole program runs inside a sandbox called a virtual machine, creating even more effort on the CPUs side which also helps the developer spend less time worrying about mundane things like memory management. Also, virtual machines help with debugging a lot because they provide a better idea of what went wrong, and can provide a generally better insight into the running program.

In essence, I think that dynamic languages are the future of programming because as CPUs get faster and faster this shift of effort will mean that development time in scripting languages goes down more rapidly as in other languages.

The paper and the defense can both be found in my slideshare slides here, beware they are both in German: http://www.slideshare.net/derDoc.

Update: I also uploaded my Master and at the same time Diploma-thesis as well as the defenses for it in both German and English to my slideshare account. As well as a poster for NSDI, all of which were made at UTEP, Texas, USA.

Reblog this post [with Zemanta]

Adium/Pidgin skype plugin

Image via Wikipedia

Some time ago I installed Nimbuzz on my iPhone and found it very cool. It allows connection to the chat systems of MySpace and Facebook and even StudiVZ – the german FB copycat that is being sued for stealing from FB – and I didn’t even know they had a chat system. Turns out they do and lots of people are requesting some kind of plugin for it, but despite talk about them opening up their API all over the net, they have not done it yet.

So naturally, I got curious how they do it and started digging. Turns out that they have a deal with SVZ which gives them access to their API. It also turns out that Nimbuzz works by running an OpenFire gateway from Jabber/XMPP to all the other IMs on their server. OpenFire is free and comes with plugins for all major IMs except skype and SVZ which they must have added. This makes sense since they only have implement the XMPP proto in their mobile client instead of all of them, and then use the GW for all accounts.

So by using a Jabber client and connecting to the nimbuzz server (on snow.nimbuzz.com) with your account you can use their GW. So I tried connecting with my Adium. Using the SVZ GW works like it should, only drawback is that Adium does not understand the msg to rename the contact. The skype gw also works but only gives you your buddies presence notifications when you send a special msg that pretends to be the iPhone client. Not even the web or Windows nimbuzz client can use the skype gw. Still that was not enough for me, so I looked again the Pidgin skype pluging that I had found a while ago and not found interesting, because it does not silence skype. You have to have skype running in the background to use their API and access their p2p net. But if you get a msg from a buddy skype still opens a chat window and notifies you, alongside Adium. That was very annoying, so I stopped using it. The newer version however, does succeed in silencing skype in the bg so that it won’t notify you anymore, so it is very useful. You can find it here: http://eion.robbmob.com/. It even gets status msgs and profile pix from your buddies which the nimbuzz gw doesn’t. Along the way I also found a very interesting presentation about dissecting skype and its proto here: http://tinyurl.com/zkuww.

I won’t bother with skype anymore but I will attempt to write an Adium plugin for the nimbuzz SVZ gw that sets the real nicks and also gets their pics.

Reblog this post [with Zemanta]

MVC applied to web-apps


The MVC pattern also fits a client-server application. The only difference is that there are now two pieces that both have their own MVC parts.

Let’s start by defining the two extreme ends of the spectrum (the corner cases):

  • Everything runs on the server, and only a static rendering of the model is transmitted to the client and ultimately, the user. This is the model of Web 1.0. Technically there is still a tiny fraction of the V running on the client (the browser) that renders the HTML, as well as of the C that reacts to links. Note that this is commonly referred to as fat server/thin client or ‘terminal server’ approach. Here an update of the GUI always involves a whole round trip to the server. This is depicted in figure 1:                            MVC server Read the rest of this entry »



The MVC model is a very handy and very well accepted model for traditional desktop applications that let the user manipulate some kind of data. Even web-applications (mostly) fit this model nicely.

In a desktop app this normally involves three distinct parts which are like services: they need to work together, but (should) have a well-defined contract/interface. They are all written in the same main language (whichever it is the app is developed in) but the model needs to interface with some kind of permanent storage (persistence) which sometimes involves translating to another language/representation of the model – be it XML, SQL, CSV, LaTEX or whatever. The View on the other hand, will mostly want to create some kind of visual representation of the model which is sometimes described in some other language than the main. That can be HTML/XUL or XML for Qt-based GUIs – you name it. VMs like Java or .NET have their own way of describing – the more appropriate term is ‘creating’ – the interface from within the program using the same main language.

Now think of MVC as applied to web-apps.  Do we get by with 3 languages? No, we need more and that makes it so much more complicated.

Server-side frameworks represent the MVC and can thus comprise 3 languages (like in desktop-apps) already. If a part of the V is to be executed on the client, we need JS in the mix. Since that JS code will mostly also need some data (part of the M) we need to translate that part into a form suitable for transfer to the client (commonly known as ‘marshalling’) with throws another language into the mix – JSON, XML or something else – for a total of 5.

What’s more is that this encompasses two distinct systems, each with their own cycles, states and debugging environments. In one word: a nightmare.



There are two most common types of frameworks: server-side and client-side.

Server-side frameworks (SSF) run, as their name suggests – on the server, and are there to help with the business logic and data of your web application. They do database abstraction – ie. persistance, help with presentation – ie. templating and also with caching, authentication etc. They come in every flavor aka. dynamic language imaginable. Some come with their own webserver, as RoR or build on existing webservers in the same language (as the Java ones do) and some just come as a CGI which run in a “normal” webserver like apache or IIS. Speaking in terms of the MVC pattern, SSFs help with all 3 – the M, the V and C. Read the rest of this entry »



Seems to me like CS people like to invent new names for the same concept, doesn’t it? Don’t they all mean the same anyway – Service Oriented Architecture?

They do. The idea behind those terms is to decouple the business logic (BL) or the actual application code providing a service, from the front-end consuming that service. Basically, what webservices do. They provide a service and return the data in a standard format without adding any presentation information to it (i.e. HTML). That leaves the job of interpreting and rendering that data to the front-end application logic, which can very well be a piece of JS code running inside a browser.

This is really good, because it decouples the application (BL) logic – or Controller if you will – from the presentation logic, the View; meaning that you can develop them independent of each other – or change one of them without changing the other; as long as they both adhere to the contract laid out for the service, they work.

The downside of that approach is that most SOAs are inherently stateless because they assume that front-ends come by, request some kind of simple service and then go about their business – what theory nerds call ‘loose coupling’. Yahoo’s webservices are a good example – like the keyword extractor: Send a long text to it, and you get extracted keywords back, it’s as simple as that. This might be good for simple things, but as soon as services get more complex, require logging-in, or involve extensive BL, that becomes a drawback. Now there are only two options: Either forget the statelessness of the service or transfer that state w/ each request anew. The first breaks the model of SOA which is not good, whereas the second might end up transferring lots of data and – you guessed it – creating the same problem that HTML has all over again (crafting state onto a by definition stateless protocol).

But remember, SOA does not command to be stateless, it merely states that it is beneficial if the services are loosely-coupled. This can very well be achieved by breaking your BL into small, distinct services that might each preserve a small amount of state: an auth. service, a basket service, a pref. service etc. all communicating with each other. The downside again, is that this generates more traffic/latency between the services thus degrading overall performance. But – oh well – there is always a trade-off, right?

So, whats there to do? Well not much, either accept one of the two solutions, find a middle-way depending on your needs and environment or implement only simple services using SOA principles.

Thoughts about RIAs


In response to the non-homogeneous landscape of implementations during the time of the browser-wars, developers started implenting client-side virtual machines to deliver “richer” interfaces to the customer. Among these VMs, also known as RIAs, are such notables as Adobe Flex/Flash/AIR, MS Silverlight and Java Applets.

I do not see the point of running another VM inside the VM that is the browser – because if you think about it, the browser is nothing else (some people lovingly call them application platforms, which is the same). Well, more precisely the browser is a content renderer for static HTML content, which is then loaded as data into a VM that can manipulate it. This VMs language happens to be Javascript, but it does not really matter. Even scarier is that the Code for manipulation and the model description itself can be intermixed in the same file.

Because of this obvious security risk, browsers employ (as already mentioned) a rigurous security model to disallow the code from doing anything besides alter the content of their current site (tab). RIAs, on the other hand, incorporate a lighter security model, which is why we’ve seen a spike in Flash-based attacks on browsers recently (keyword: drive-by-flash-attack).

Another drawback is the application downloading phase (termed ‘DA’ in the paper “Life above the service tier”). In a RIA this has to occur all at the beginning for the entire application, and the VM has to be started as well. This takes a considerable amount of time. In the early days that was one reason that no one liked flash, and still holds true for Java Applets. Whereas in plain HTML you can do that incrementally by loading each file consecutively which even enables you take advantage of caching or CDNs on the way from the server to the client – a huge advantage.

The only advantage that I can see in a RIA is that they obviously obfuscate the code, because applications for them are normally compiled – well to a bytecode, but still – thus preventing people from stealing your work or being able to look into your business logic (if you absolutely need to execute some of it on the client). While obfuscation is not as simple (automatic) in HTML/JS, it is still possible.

So, again, why run another VM inside, well what amounts to basically two VMs already – the OS and the browser, when current JS and browser implementations are now compatible enough (maybe w/ some help from tools such as GWT) so that you can realize anything with them? Even more so, considering that the internal scripting language of Flash (ActionScript) is actually Ecma-262 aka. EcmaScript, which also happens to be what JS is based on – in effect even the languages are the same.

Reblog this post [with Zemanta]

Useful addition to newsreaders


The other day, while sifting through his thousands of new messages while getting annoyed that everybody seems to be blogging about the exact same thing, my buddy had a pretty nifty idea.

Wouldn’t it be great if your newsreader could somehow read and “understand” the news you are just reading and then mark all those other news related to this one as already read, so you don’t have to read ’em twice? With a little bit of NLP, some Yahoo web-servicing and some more black magic that should be possible, right?

I was thinking to do that as a flock add-on to their already excellent news-reader (kudos, guys!).

I can think of a couple ways to match two entries:

  • Simplest way: Use Google (blogs, news) using the title as search-terms. Whichever other entries show up within the first [30,50,100] results, mark them.
  • Harder way: Use a document-similarity tool, to check current entry against every other (might also be slower).
  • Hardest: Extract keywords from entry or use title and use an online MSR like this one: http://cwl-projects.cogsci.rpi.edu/msr/

Anyone up for this?

Reblog this post [with Zemanta]