The other day I was attending a talk by UCSD Professor Charles Elkan about the Netflix competition, who was incidentally one of two external judges of it, which was really insightful and started getting me interested in Data Mining. So after thinking about storage requirements for my pet-‘Web 4.0 monolithic web-application’ I decided to dig deeper into DBMS technology vs. filesystems. I learned about OODBMS along the way, so we have to cover those briefly as well.

Let’s start with DBMS. They were created to handle large sets of data efficiently in the face of lots of concurrent reads and writes – read users – as well as allow swift querying of that data. They evolved from a mathematical sound theory – first order predicate logic – into what is known as relational algebra. To summarize, DBMS provide the following things:

  • Correctness under concurrent access – known as the ACID principles,
  • Complete indices for the data which together with
  • a querying language based on a mathematically closed model allows for efficient searching aka. a powerful query execution engine.

Read the rest of this entry »