“So I really believe, guys, that we are in a war of Blobs versus Chunks. We are in a war between giant, unstructured blobs of content, and clean, well-structured fields of content that have metadata attached. We are in a war of Blobs versus Chunks. You all are on Team Chunk. We cannot let the blobs win.” — Karen McGrane
All the content managers sat there in shock. A war? In content management? Aren’t wars something that happened only between users of Unix text editors? Why would anybody wage war in the world of Content Management Systems (CMS)? After all, content managers were united by the fact that all CMS platforms sucked. This had kept the peace, even if it was an uneasy one.
So why was Karen McGrane calling them to war? Traditional CMS's are descendants of "Desktop Publishing Systems," in which editors designed pages, typically for print, using WYSIWYG (What You See Is What You Get) features.
For print publishing, the output format was fixed. Before digital, reusability was not a priority. Republishing the same content also usually meant redesigning it.
This is a problem for digital platforms. Content managers have always known this; they just accepted that it was in their cruel fate to hand design every single page on the Internet.
However, as new devices of many shapes, sizes and form factors came into the market, this multiplied the number of font sets, resolutions, pixel depth, and device capabilities. It became painfully obvious that digital platforms couldn’t be serving content using the same ways and workflows created for print publishing.
As platforms proliferated, WYSIWYG quickly became WYSIWTF.
Karen McGrane explains:
“The era of ‘desktop publishing’ is over. Same goes for the era where we privilege the desktop web interface above all others. The tools we create to manage our content are vestiges of the desktop publishing revolution, where we tried to enable as much direct manipulation of content as possible. In a world where we have infinite possible outputs for our content, it’s time to move beyond tools that rely on visual styling to convey semantic meaning. If we want true separation of content from form, it has to start in the CMS.”
We needed true separation of content from the context they appeared in. And we needed this separation in the CMS.
What, exactly, is a blob? Karen explains:
“We have CMSes that allow people to say, ‘I want it to work just like Microsoft Word.’ I want to be able to embed tables and fonts and images, and I want to be able to style and design my content so it’s perfect for the way that I’m imagining it’s going to look and work. I want to have a tool that works just like Microsoft Word so that I can imagine how my content is going to look and work in the one and only one context that I’m imagining it, and that is the desktop website. And so you have people who have the ability to create these giant unstructured blobs of content with formatting and images and whatever else embedded.”
A blob is a collection of data produced by WYSIWYG type editors that allow editors to embed fonts, tables, images, and anything else in between within their content.
The problem is this isn't the most ideal way of working, yet virtually all existing CMS platforms work that way.
StackOverflow user Andrew, while trying to find a CMS that supported Adaptive Content, described the content workflow when using a Blob-based CMS:
“With a traditional web publishing tool, I would probably have had to create a new page under News, and then type in and format the news article in a blank WYSIWYG text editor. — i.e. pages of content”
On the other hand, the content platform he was looking for would work differently:
“I would tell the CMS the type of content I'm creating, and be asked to fill in a form with individual fields tailored to news articles (e.g. headline, subtitle, full text, short snippet, and images). — i.e. pieces of content”
The war of chunks vs. blobs was immediately put to a stop on that StackOverflow thread, with the discussion closed under the pretense that the question was “not constructive.”
Contentful is a content infrastructure that satisfies needs like Andrew's for a content platform, and more. It allows you to define your content as individual pieces, using content types made up of fields and references, and then delivering that content via an API; allowing it to be presented in any context, on any platform, using any technology stack.
It’s important to note that the API is an important ingredient here, especially because of the way it works. Many desktop publishing style CMS platforms also have an API; however these are blob-based and amounts to content being delivered via “Blobs as a Service”, rather than providing content directly over the API.
We don’t need APIs that deliver blobs. We need APIs that deliver chunks.
Karen McGrane is our sensei. She shows us why traditional CMSs, which descended from desktop publishing that work in terms of pages, are the wrong solution to digitally delivering content.
A page is not content, young grasshopper, a page is a context in which content appears.
Bruce Lee helps us understand this on a deeper level:
“Empty your mind, be formless, shapeless - like water. Now you put water into a cup, it becomes the cup, you put water into a bottle, it becomes the bottle, you put it in a teapot, it becomes the teapot. Now water can flow or it can crash. Be water, my friend.”
Content is like water—and the wide variety of digital platforms requires that the substantive content is defined with validation and reusability in mind, and be kept separate from the context and structure.
This requires a topic-centric approach, rather than a page-centric approach.
Ann Rockley, a foremost expert in organizing and presenting information online, and a veteran of the war against blobs, has been developing her concept of “Intelligent Content” for a long time. She explains:
“Intelligent content is content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable, and adaptable.”
Ann Rockley has been a major influence on the Darwin Information Typing Architecture (DITA), an XML data model for authoring and publishing. XML is a data format that is designed to be both human and machine-readable. It manages to accomplish neither well and, like mosquito bites, it tends to cause an allergic reaction in developers.
There is nothing about mosquito bites that make them inherently itchy—an allergic reaction causes the itch; it just happens that almost everybody has this allergy. One theory is that mosquitoes carry many forms of disease, but people who were allergic to their bites tended to leave areas with many mosquitoes and take measures to avoid their bites, thus also avoiding the associated diseases.
The reason that almost everybody alive today has this allergy is that our ancestors did just that. Those who were not allergic were not so much bothered by the bites, and did not leave or take these measures, and so did not survive to become ancestors. There must have been, at one time, developers who were not allergic to XML. We do not know what became of them.
However, DITA gives us a key weapon in our war against blobs, the “topic.”
Your content model should include well-defined topics—topics are what your app or site is about. Maybe you have a watch screen listing upcoming events; the events would be the topics. Maybe your app has a carousel component with related products; the products would be the topics. Maybe you have a web page presenting articles; the articles would be the topics.
Your content model should include content types that form your topics. Events, articles, products, and anything else you have should be defined as content types, broken down into the fields needed and include validations to ensure they are correctly entered.
Topics should be:
Individually authorable. Each topic is about one area and can be created on its own.
Always reusable. A single topic can appear in multiple places and across different channels, so should be free of being locked to specific content or layout, and be usable anywhere.
Independently understandable. Each topic should be sufficiently complete, so that it can be independently presented.
An event, for example, is the kind of thing you would want shown on a web page and on a watch screen; in a number of different contexts.
That's the concept: editors should not have to multiply their efforts by duplicating topics. The saying “Create Once, Publish Everywhere,” made famous by NPR, describes this idea.
For instance, instead of having an initial version of the event for the web and a copy-and-pasted version for a watch screen, editors should be able to reuse the same event everywhere.
When creating your topics, it’s important to make sure each topic has distinct and separate content types. Even when they share the same fields as other content types, they often have different governance models; thus requiring different permissions, internationalization plans, functional requirements on the front end, and organization requirements. Rachel Lovinger gives the following example:
“A Press Release may be very similar to a general Content Page, but only the Press Release is going to appear in an automatically aggregated Newsroom. It’s easier for these to be filtered out if they’re a unique type of content.”
To deliver topics across multiple channels, they should be free of any predefined layout and context, and composed of pieces of content in the form of fields and references.
You can think of your collection of topics as the domain model for your content. Once you have your topics, your authors can create content.
In most cases, editors will still require a way to manage the contexts in which these topics appear. Rachel Lovinger calls this the Assembly model which she describes as “The way content creators will put individual content items together to make web pages, campaigns, documents, or other content products.”
To build your assembly model, you will create assemblies in addition to your topics. Assemblies are also content types, but generally with less diverse fields, as they don’t describe what your site is about, but rather how it is put together.
Assemblies have fields for layout and site navigation data, they sometimes also have additional metadata for SEO or analytics, but they don’t have content in them. Assemblies use reference fields to refer to topics.
Assemblies can be nested inside other assemblies, so they can also have reference fields that refer to other assemblies.
Assemblies structure your content for presentation to the end user.
Examples of assemblies are a carousel, a set of featured events, a modular page layout, a basic page, etc.
You can start to plan your assembly model by breaking down the design or wireframe for your app or site. Each area of the page and the page itself is an entry of an assembly content type.
For example, a page assembly might have a URL slug, Google Analytics code, and a set of references allowing an editor to select page modules; including the components in the wireframe (i.e., a banner, a carousel of products, a featured video, and a featured article).
Assemblies are not individually authorable, but are individually maintainable. Unlike topics, assemblies cannot be authored individually, as they refer to topics and other assemblies. To author a topic, one simply has to complete some fields, but with assemblies; completion requires making all its constituent topics and collecting them into an assembly. When adding new topics, this can be done at the same time as adding the assembly, when reusing topics, the assembly is used to reference existing topic entries.
Whether or not a single content creator authors all the topics that constitute a complete assembly, it should be feasible for a single content creator to maintain all the parts of an assembly. Individually maintainable means that a single individual is capable of wrangling all an assembly's components.
Assemblies are sometimes reusable. While topics are always reusable (for example, one event can appear in many different contexts), that isn't always the case with assemblies. As assemblies structure how topics are presented and they connect topics to the contexts in which those topics appear. Some assemblies may be tied to the context and, accordingly, are not reusable.
For example, a carousel of events is indeed reusable: perhaps it is a good idea for the same carousel to appear both on an app and a web page. On the other hand, a modular page layout may be useful for web pages but not a watch screen. Perhaps the same topics will be presented differently when separately shown on a watch screen compared with on a web page—a modular page layout would then work great for the latter but not for the former.
While topics are independently understandable in a way that looking at a topic itself is enough to know what it is, assemblies are independently incomprehensible. This means an assembly is meaningless without the topics it refers to because they are a means of composing topics for display.
Assemblies allow you to create validations ensuring that valid and complete content is composed and presented to the user.
For example, in a strict assembly, all fields allow for a single value or reference (i.e., if you had a three-block page assembly). The assembly content type could have three fields; “hero banner,” “product carousel,” and “special offers.” Each of these fields would be validated to only allow the appropriate content type, allowing editors to select the page modules to be included in the page, but not position them.
Changing the position of these modules is less straightforward. For instance, to move the product carousel to the bottom and the special offers to the middle would require a code change and redeployment of the software.
An alternative is to use a flexible assembly, where instead of using single reference fields validated to only one specific content type, a multiple reference field is used that is validated to all three content types.
Using flexible assemblies allows editors to select any valid content types and sort them; letting them organize and merchandise content on their own without developer intervention.
Fixed assemblies are great for developer-oriented teams that wish to control layout exclusively in code and allow more editorial-focused teams to give flexibility to their editors.
You can think of your collection of assemblies as the bridge between your domain model, as expressed in your topics, and your view model, which is ultimately rendered by your front-end templates and components.
The properties of your components and the placeholders in your templates will match the fields of your content types.
Topics and assemblies are your weapons in the war of chunks vs. blobs. By breaking things your site or app is about down into clear, reusable, validatable topics and using nested assemblies to compose topics for display across different channels, we can win this war for team chunk. Contentful is on your side, and we’re here to help.
By Dmytri Kleiner and Stephen Pastan