Topics & Assemblies

Your Weapons in the War of Blobs vs Chunks

“So I really believe, guys, that we are in a war of Blobs versus Chunks. We are in a war between giant, unstructured blobs of content, and clean, well-structured fields of content that have metadata attached. We are in a war of Blobs versus Chunks. You all are on Team Chunk. We cannot let the blobs win.” — Karen McGrane

All the content managers sat there in shock.

A war? In content management? Aren’t wars something that happened only between users of Unix text editors?

Why would anybody wage war in the world of CMS? After all, content managers have united in the fact that all CMS platforms sucked. This kept the peace, even if it was an uneasy peace.

Why was Karen McGrane calling them to war?

Traditional content management systems are descendants of "Desktop Publishing Systems," in which editors designed pages, typically for print, using "WYSIWYG" features.

For print publishing, the output format was fixed. Before digital, reusability was not a priority. Republishing the same content usually also meant redesigning it.

This is a problem for digital platforms. Content managers have always known this; they just accepted that it was their cruel fate to hand design every single page on the internet.

However, as new devices came on the market, this multiplied the number of font sets, resolutions, pixel depth, and device capabilities. It became painfully obvious that digital platforms couldn’t be compared to a trillion different PDFs.

As platforms proliferated, WYSIWYG quickly became WYSIWTF.

Karen McGrane explains:

“The era of ‘desktop publishing’ is over. Same goes for the era where we privilege the desktop web interface above all others. The tools we create to manage our content are vestiges of the desktop publishing revolution, where we tried to enable as much direct manipulation of content as possible. In a world where we have infinite possible outputs for our content, it’s time to move beyond tools that rely on visual styling to convey semantic meaning. If we want true separation of content from form, it has to start in the CMS.”

We need true separation of the content from the context in which it appears. And we need this separation in the CMS.

What, exactly, is a blob? Karen explains:

“We have CMSes that allow people to say, ‘I want it to work just like Microsoft Word.’ I want to be able to embed tables and fonts and images, and I want to be able to style and design my content so it’s perfect for the way that I’m imagining it’s going to look and work. I want to have a tool that works just like Microsoft Word so that I can imagine how my content is going to look and work in the one and only one context that I’m imagining it, and that is the desktop website. And so you have people who have the ability to create these giant unstructured blobs of content with formatting and images and whatever else embedded.”

A blob is what is produced by WYSWYG type editors that allow editors to embed fonts, tables, images within the content.

The problem is that virtually all existing CMS platforms work that way.

StackOverflow user Andrew, asking which CMS supports Adaptive Content, describes a Blob-based CMS:

“With a traditional web publishing tool, I would probably have had to create a new page under News, and then type in and format the news article in a blank WYSIWYG text editor. — i.e. pages of content”

Whereas the CMS he is looking for would work differently:

“I would tell the CMS the type of content I'm creating, and be asked to fill in a form with individual fields tailored to news articles (e.g. headline, subtitle, full text, short snippet, and images). — i.e. pieces of content”

The war of chunks vs. blobs was immediately put to a stop on StackOverflow, where the question was closed as “not constructive.”

Contentful is such a CMS. It allows you to define your content as pieces of content, using content types made up of fields and references, and then delivering this content over an API, allowing it to be presented in any context, on any platform, using any technology stack.

It’s important to note that the API is an important ingredient here, but not sufficient on its own, many desktop publishing style CMS platforms also have an API, however as these are blob-based, rather than providing content over the API, what is delivered amounts to “Blobs as a Service.”

We don’t need APIs that deliver blobs. We need APIs that deliver chunks.

Karen McGrane is our sensei. She shows us why traditional CMSs, descendant from desktop publishing that works in terms of pages, are the wrong path.

A page is not content, young grasshopper, a page is a context in which content appears.

Bruce Lee helps us understand this on a deeper level.

“Empty your mind, be formless, shapeless - like water. Now you put water into a cup, it becomes the cup, you put water into a bottle, it becomes the bottle, you put it in a teapot, it becomes the teapot. Now water can flow or it can crash. Be water, my friend.”

Content is like water.

The wide variety of digital platforms requires that the substantive content is defined with validation and reusability in mind, and be kept separate from the context and structure.

This requires a topic-centric approach, rather than a page-centric approach.

Ann Rockley, a foremost expert in organizing and presenting information online, and a veteran of the war against blobs, has been developing her concept of “Intelligent Content” for a long time. She explains:

“Intelligent content is content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable, and adaptable.”

Ann Rockley has been a major influence on the Darwin Information Typing Architecture (DITA), an XML data model for authoring and publishing.

XML is a data format that is designed to be both human and machine-readable. It manages to accomplish neither well, and like mosquito bites, it tends to cause an allergic reaction in developers.

There is nothing about mosquito bites that are inherently itchy. An allergic reaction causes the itching, it just happens to be that almost everybody has this allergy.

One theory is that mosquitoes carry many forms of disease, people that that were allergic to their bites tended to leave areas with many mosquitoes and take measures to avoid bites, thus also avoiding the diseases.

The reason that almost everybody alive has this allergy is that our ancestors did. Those that were not allergic were not so much bothered by the bites, and did not leave or take these measures, and did not survive to become ancestors.

There must have, at one time, been developers who were not allergic to XML. We do not know what became of them.

However, DITA gives us a key weapon in our war against blobs, the “topic.”

Your content model should include well-defined topics. Topics are what your app or site is about. Maybe you have a watch screen listing upcoming events; the events would be the topics. Maybe your app has a carousel component with related products; the products would be the topics. Maybe you have a web page presenting articles; the articles would be the topics.

Your content model should include content types that model your topics. Events, articles, products, and whatever other topics you have should be defined as content types, broken down into the fields needed, with validations to ensure they are correctly entered.

Topics should be Individually authorable. Each topic can be created on its own, and each topic is about just one thing.

Topics should be Always reusable. Each topic may always appear in multiple places and across different channels, so should be free of content or layout, and be usable anywhere.

Topics should be Independently understandable. Each topic should be sufficiently complete, so that it can be independently presented.

An event, for example, is the kind of thing you would want to be shown on a web page and on a watch screen and in a number of different contexts.

This is the idea: editors should not have to duplicate their efforts by duplicating topics. The term “Create Once, Publish Everywhere,” made famous by NPR, describes this idea.

Instead of having one version of the event, for instance, for web and a second copy-and-pasted version for a watch screen, editors should be able to reuse the same event everywhere.

When creating your topics, it’s important to make each topic a distinct content type. Each topic should have separate content types. Even when they share the same fields as other content types, they often have different governance models, thus requiring different permissions, different internationalization plans, different functional requirements on the front end, and different organization requirements. Rachel Lovinger gives the following example:

“A Press Release may be very similar to a general Content Page, but only the Press Release is going to appear in an automatically aggregated Newsroom. It’s easier for these to be filtered out if they’re a unique type of content.”

To deliver topics across multiple channels, they should be free of layout and context, and composed of pieces of content, in the form of fields and references.

You can think of your collection of topics as the domain model for your content.

Once you have your topics, your authors can create content.

In most cases, editors will still require a way to manage the contexts in which these topics appear. Rachel Lovinger calls this the Assembly model which she describes as “The way content creators will put individual content items together to make web pages, campaigns, documents, or other content products.”

To build your assembly model, you will create assemblies in addition to your topics. Assemblies are also content types, but generally with less diversity of fields, as they don’t describe what your site is about, but rather how it is put together.

Assemblies have fields for layout and site navigation data, they sometimes also have additional metadata for SEO or analytics, but they don’t have content in them. Assemblies use reference fields to refer to topics.

Assemblies can be nested inside other assemblies, so they can also have reference fields that refer to other assembles.

Assemblies structure your content for presentation to the end user.

Examples of assemblies are a carousel, a set of featured events, a modular page layout, a basic page, etc.

You can start to plan your assembly model by breaking down the design or wireframe for your app or site.

Each area of the page and the page itself is an entry of an assembly content type.

For example, a page assembly might have a URL slug, a google analytics code, and a set of references allowing an editor to select page modules, including the components in the wireframe, i.e., a banner, a carousel of products, a featured video, and a featured article.

Assemblies are not Individually Authorable; they are Individually maintainable. Unlike topics, assemblies cannot be authored individually, as they refer to topics and other assemblies. To author a topic, one simply has to complete some fields, but with assemblies, completion requires making all its constituent topics and collecting them into an assembly. When adding new topics, this can be done at the same time as adding the assembly, when reusing topics, the assembly is used to reference existing topic entries.

Whether or not a single content creator authors all the topics that constitute a complete assembly, it should be feasible for a single content creator to maintain all the parts of an assembly. Individually maintainable means that a single individual is capable of wrangling all an assembly's components.

Assemblies are Sometimes reusable. While topics are always reusable, for example one event can appear in many different contexts, assemblies are not so. As assemblies structure how topics are presented, they connect topics to the contexts in which those topics appear. Some assemblies are tied to the context and, accordingly, are not reusable. Other assemblies are not tied to context.

For example, a carousel of events is indeed reusable: perhaps it is a good idea for the same carousel to appear both on an app and a web page.

On the other hand, a modular page layout may be useful for web pages but not for a watch screen. Perhaps the same topics will be presented differently when separately shown on a watch screen compared with on a web page. A modular page layout would then work great for the latter but not for the former.

While topics are independently understandable, looking at a topic on its own is enough to find out what it is, assemblies are independently incomprehensible. The assembly is meaningless without the topics it refers to. Assemblies are a means of composing topics for display.

Assemblies allow you to create validations ensuring that valid and complete content is composed and presented to the user.

For example, in a strict assembly, all fields allow for a single value or reference. I.e., if you had a three-block page assembly. The assembly content type could have three fields, “hero banner,” “product carousel,” and “special offers.” Each of these fields would be validated to only allow the appropriate content type. This would allow editors to select the page modules to be included in the page, but would not allow them to position them.

Changing the position of these modules. For instance, to move the product carousel to the bottom and move the special offers to the middle would require a code change and redeploy of the software.

An alternative is to use a flexible assembly, where instead of using single reference fields validated to only one specific content type, a multiple reference field is used that is validated to all three of the valid content types.

Using flexible assemblies allows editors to select any valid content types and sort them, to organize and merchandise content on their own without developer intervention.

Fixed assemblies are great for developer-oriented teams that wish to control layout exclusively in code; flexible assemblies allow more editorially focused teams to give flexibility to their editors.

You can think of your collection of assemblies as the bridge between your domain model, as expressed in your topics, and your view model, which is ultimately rendered by your front-end templates and components.

The properties of your components and the placeholders in your templates will match the fields of your content types.

Topics and assemblies are your weapons in the war of chunks vs. blobs. By breaking down the things your site or app is about into clear, reusable, validatable topics and using nested assemblies to compose topics for display across different channels, we can win this war for team chunk. Contentful is on your side, and we’re here to help.

By Dmytri Kleiner and Stephen Pastan

Discover Contentful

Learn how Contentful works

Read about our content management infrastructure, our flexible APIs and our global CDN.
View key features

See how our customers use Contentful

From apps and sites to digital signage and iBeacon campaigns — we deliver content everywhere.
Explore other customers