Tuesday, February 14, 2012

1. The story behind structr

This is the first post of a planned series about structr. Sorry that it took so long to get this out, but we were (and still are) quite busy in projects.

Most of you have probably never heard of structr, so we start from the beginning:

History and Background

Back in 2010, I decided to leave my former company and start something new. For more than eight years, we built closed-source ECM solutions based on the Oracle database and application server. After years with enterprise customers, enterprise problems and enterprise pricing, it felt like being a weightlifter too strong to move. Something had to change, so I made the decision to leave, take a break, and start to question all the tools, frameworks, languages and processes I was used to, and to look around open-mindedly for new things, tools - and most importantly - people.

In the beginning, I had no exact idea what to do next, but it seemed reasonable to do something with content management, something I had collected some experience in. So I started thinking about “how would a perfect CMS be like, when it could be built today, from the scratch”. Another thing I had in mind was to create an open-source project, because I just love the open source idea, being a Linux/GNU fellow since 1994.

After trying out some languages and frameworks like Ruby on Rails, Objective-C or Scala, I decided to stick with Java for the core programming. Not because Java was such a beautiful and easy-to-handle language. But it felt right to have some constants to rely on. Other pros like the large developer basis, the performance of the JVM or the great IDE support outweighted the cons.

On the other hand, one thing I was convinced of being necessary to do was to refrain from Oracle. Not only in the technical sense, but also as an ex-Oracle employee and long-term Oracle software partner. There were great and poor times as an Oracle partner. Summarized, the experience was: The bigger they became, the worse it got for us as a small german software company (except for some individuals at Oracle’s partner management and development team who did their best to support us).

Wanted: A New Database

The most common approach of a database back-end for CMS was to put an ORM layer on top of a relational database and map content entities to database tables. But the more specific the requirements become, the more this static mapping is in the way. To circumvent this problem of inflexibility, some had written meta layers to decouple the data entities from the database tables. But that felt always like shifting the complexity from one layer to the next, with no real benefit.

So I wanted and needed a new database. There had to be a better, more natural way of mapping content structures to a database, and so I started researching open-source NoSQL databases, in particular object databases. After very short time, I found graph databases to be a natural, even trivial fit, leaving open only the question which one to choose. One of the players in this field was Neo4j (version 1.0 was just released those days), and I was attracted by the possibility to embed the database while it promised enterprise scalability and full ACID transactions (something I considered necessary for content management).

In August 2010 I attended a NoSQL workshop in Frankfurt where I met Peter Neubauer of Neo Technology. After listening to his talk and a having a chat over some beers, I knew that I didn’t have to look any further. My calculation was “if the other guys are only roughly as open-minded, competent and helpful as Peter, you cannot make any mistake”. And it worked out, by this very day.

Then, some day in 2010, I told my brother Christian (coincidentially a software developer and architect with Java background :-) ) about my new project and he was enthusiastic about the idea of creating something new. So luckily he did not hesitate to join me, and we started brainstorming and coding together. Christian started coding and - among many other useful things- added a services infrastructure layer and redesigned large parts of the code basis, which then became more structured and elaborate by his constant work. In the meantime, we’re working together on a regular basis and founded a company to put the project on a professional basis.

Requirements, Ideas and Design

Our goal was to build a better CMS: Easy to use for authors, editors, site admins and design people; easy data integration; modular, extensible, fast and secure, easy to install, run and maintain, in a self-contained package with a small resource footprint.

Sketching structr ...
When we started, structr was mainly driven by the thought of getting rid of the limitations of a traditional relational database back-end, leveraging the flexibility of a schema-less NoSQL (graph) database. So we designed some basic classes for domains, sites, folders, and content objects like files, pages, templates, text, scripts, html etc., mapped them to nodes and properties, and linked them together by relationships. We decided to put binary content into the file system to keep the graph database storage lean. In addition to the folder hierarchy, we added a security system with users, groups and ACLs, allowing to define different access levels for elements of the content graph.

When we started designing and hacking the front-end, we implemented a rendering engine using a rather classic approach: We had template nodes containing HTML code, enriched with FTL (Freemaker Template Language) code to integrate content, data and (in an experimental attempt to add more dynamic features) application nodes. With this toolset at hand, the back-end user was able to set up and edit a content tree structure containing templates, content and data nodes, which was rendered top-down to the output streamed to the browser.

Back-End UI of First Release
The back-end UI was built with Apache Click, a well-documented, stable and straight-forward Java component framework. At that time, it was not in the primary focus to make a perfect UI for every target group. After a while using it by ourselves and absorbing some feedback from others, we knew that we could do better.

Re-Thinking and Re-Design

While we were quite happy with the database mapping and the basic infrastructure, the rendering and back-end UI part of the first releases (0.3 - 0.4) was not satisfying. The first point we didn’t like was the template logic: Template and content files of the classic CMS logic were just replaced by template and content nodes. That just didn’t feel “graphy” enough, too similar to a file system. The second point was that the back-end UI was too technical and not suited for non-technical people.

So in late-2011, we decided to cease the existing rendering engine and the backend UI and create something completely new, based on REST and Websocket server components, client-side Javascript and some nice HTML5 features.

Unfortunaltey, it took much more time to build structr’s new UI engine, and we could only do it in our spare time. But it was (and still is is) a great experience because it's fun to create something that has some unseen features like the collaborative page editor, enabling users to work together simultaneously on CMS pages in real-time, just as you know it from Google Docs. Every edit action results in a client-server-client round-trip, making changes visible to any connected client in real-time.

Some Details

Although this first blog post is not intended to cover technical details, a short description of the concept may be of interest for some of you.

The overall idea of the content data model is to put a complete web site into a graph, and render pages (page trees) by processing parts of the graph by traversing over certain relationships, starting at a resource addressed by a URL. The relationships used for the rendering traversal are marked with the ID of the resource and a position index, so that their content is aggregated in dependence of the resource ID and assembled in the right order. That way, specific nodes, at any level and of any type, can be re-used in different page trees without necessarily render all their sub-nodes, too. This way, we can avoid the classic template-based approach.

Technically, structr consists of four main components:

(I) The content graph repository. It stores all content and meta information needed to render all possible resources. It has a configurable, DSL-like schema for mapping specific use-cases (data binding, lookups, constraints etc.) to a graph. The details of the data binding will be addressed in antoher blog post.

(II) The server-side REST/JSON and Websocket interface. It listens to client requests and processes certain commands which result in changes within the graph repo. Via the Websocket interface, changes are broadcasted back to all connected clients.

(III) The back-end web client. It serves as a page/content/graph editor and displays the data rendered by the server using a Javascript library which connects to the structr Websocket server and issues commands corresponding to view and edit actions.

(IV) The front-end servlet which renders the html and file output as a result of a traversal.

In the next blog post, we'll cover some of the above mentioned a little more detailed.

Current Status of the Project

Currently, we’re working hard to get release 0.5 out, containing a basic version of the new back-end UI. Next steps are packaging the Websocket and client-side Javascript into libraries, documenting it and releasing it as a foundation for interactive structr web applications.

If you have questions, please use the comments.

You can follow the project on Twitter here: @structr
The structr blog is available also here: http://structr.org/blog

8 comments:

  1. Interesting story. Would you be interested in having it featured on the NoSQL Zone (part of the DZone.com network)? ping me at egenesky [at] dzone dot com

    ReplyDelete
  2. great stuff! Been thinking of a graph-based CMS for a while :)
    Have you guys considered using something MVVM-like for the front-end? I'm super impressed by knockoutjs' flexibility. Combined with the flexibility of neo4j, it could create wonders, I think :)

    Great stuff non the less, will be keeping an eye on this project for sure.

    ReplyDelete
    Replies
    1. Thanks Oskar!

      Yes, we're actually thinking about switching from plain jQuery to a MVVM-like framework. But we have the requirement to update multiple DOM nodes at once. Not sure if that's possible with knockoutjs.

      Delete
  3. Very interesting path, guys. Congrats. We too are thinking of a "graph-based CMS".

    I suggest you look at the "Node+Graph+Stories Architecture" developed by the Anahita guys... http://bit.ly/zwn7sK

    ReplyDelete
  4. Though Neo4j takes care of storing the CMS structure, where is the actual content stores? What properties are stored in the nodes and in the relationships?

    ReplyDelete
    Replies
    1. Sorry for the late answer!

      We store path or URL references (UUID, partly split into a path) in Neo4j.

      The content actually lies in the file system.

      Delete
  5. This comment has been removed by the author.

    ReplyDelete