Problems Yet to Solve
Perhaps you're wondering what stage I'm at, and why there is not much information in terms of documentation, tutorials, etc. Well, truth is, there is still a lot I haven't architected yet let alone written.
I'll summarise some of the biggest issues left to tackle here...
Node Storage and Retrieval
The framework will be built to support the EAV model. It'll need to be able to store nodes, attributes and a value for each node-attribute relationship.
The biggest problem here is that I don't want to set all attribute value columns to a text or varchar data type (as some frameworks do as a cop-out). I need to identify a schema that allows values to be stored in a column with the correct data type (integer, float, string or binary).
So, how would the framework retrieve all this information without generating multiple queries per node? There are a couple of options weighing up in my head at the moment:
- Query Solr and rely on that to do all the complicated internal stuff and return me the hydrated node. Can Solr handle this much responsibility? How good is it at storing data for potentionally >500,000 nodes? Could it be relied on by a high traffic web site?
- Make use of the MySQL coalesce function (thanks to John Wright). The schema would be very straight forward, as would the queries. What's the performance like though? I'd still need to think about how to avoid a huge node_attribute_value table. Perhaps split it up by content type, similar to the CCK module in Drupal?
Right now I'm favouring option two.
URL Generation
The framework will have a category structure which will in turn provide a dynamic menu structure. This allows content publishers to freely create new channels, microsites, etc. and for a link to be placed automatically into the navigation of the site. Easy enough.
However, each node can be assigned to many menu placements (e.g. Featured and News > Sports). One assignment must be made the primary. Each node would have it's own URL slug, the the full URL to the node would be prepended by the category(ies) slug(s), e.g. /featured/my-node.html and /news/sports/my-node.html. This system would allow the same node to appear within several sections of the site, with different URLs. Bad for SEO? Yes. So we'd need to map and use canonical links to point to the primary URL. In addition, we'd need to keep track of previous node URLs to generate 301's if/when a slug changes.
So, how do we store all the URLs for a node? We could simply put all URLs into a table with a relation to which node it belongs to. Would those URLs contain the category slug, or just the node slug? If they contain the category slug, how would we update... I'll stop there. I've just answered my own question: yes, store the category slug as well as the node slug and insert a new row if a slug changes, and set the existing row to 301 to the new row. Sorry if that's not clear, it's there for my own benefit so I don't forget!
Looks like that question may now be answered.
Configuration
There are many types of configuration that the framework will require. From database connection settings to routing and how many nodes to show on a listing page. Only some of this would be wise to make configurable using the database. I need to do some thinking and define a policy here.
There are lots more things to think about too, including exactly how the template system would work. Although on my mind, and considered when thinking about any of the above issues, these are relatively lower priority right now. I'll keep you updated with progress.
