One Million Content Fragments – Lessons Learned from a Real Project

One of our recent projects involved managing 1,000,000 content fragments based on more than 100 models, all stored within AEM. Dealing with this massive dataset, we encountered several challenges that are worth sharing. For instance, the sheer volume of properties and references in these content fragments had a significant impact on query performance, causing AEM to struggle.

The presentation will delve into the limitations of the graphQL interface in such scenarios and offer practical solutions for working around them. Finally, it will conclude with best practices for effectively managing massive datasets like this one within an AEM environment.

Yegor Kozlov

What are you recommendations for sharding content fragments?

RolandGruber

Try to put them in a folder structure so that not too many of them are in one directory

shruti.damle

How did you handle multi site-multi language use cases if you had any?

Tad

i.e. are the 1 million CFs actually 10 million CFs after they finish turning the translation vendor into a smoking crater? :)

RolandGruber

We used folders and variants for the languages

Henry Kuijpers

Wouldn't it have been possible to create a composite multifield instead of an "old school" multifield? Then you don't end up with JSON, but instead with sub nodes with properties, right? (And that will still not fix all the other issues of course)

RolandGruber

The problem is still that the grapgQL implementation does not recognise it and you cannot access the fields in the query

Helge

Any recommendations on how to handle versioning on cfm models, especially when it comes to also having referenced nested cfs

RolandGruber

The changes should be backward compatible. But we also had a case where a v2 version of a model needed to be introduced.

Michal

Would you recommend to use AEM for this use case?

(see answer in talk video)

Jörg

Given the severe problems you have encountered with AEM graphql (and AEM delivering "incorrect" data), have you ever checked with Adobe regarding this? Was it accepted as issue?

Tomasz Sobczyk

I have a question to Adobe here I guess - are these apis are going to ever be improved? Should we even be trying to use them? When you compare this to Groq from sanity cms it looks a bit like small school project rather than an enterprise grade solution for headless content delivery

sabdouni

Did you consider dropping GraphQL and going back to REST apis? if not what make you consider keep using GraphQL

Jan Stettler

Thing is: with aem you have - admin interface - roles and rights - no special infrastructure

Rogier

Would you use AEM for such a solution again? Why (not)?

Tad

This was my question as well, given that AEM being forced into the role of an RDBMS is going to result in involved solutions. Curious what requirements pushed one to really WANT to use AEM as the storage mechanism for such data topology.

RikVB

What happens if you remove a brand which is referenced in different car models?

RolandGruber

That is the additional complexity, you need to make sure the data stays consistent

Yegor Kozlov

Was GraphQL worth the effort in the end? Would it have been easier to write custom code to support the requirements?

Arko

Suppose we want to translate those nested multifield content fragments. How will we deal with the authoring of the nested ones being translated. How will the paths get replaced in the parent nodes?

RolandGruber

You can use variants for the translation. The variant can have different references

RikVB

If you use the data both externally and within AEM Sites, how do you establish a single source of truth for your content?

RolandGruber

In our case AEM was the truth

Ive

What is the allowed deepest tree of refernece you have observed ? Meaning from the level the performance starts decreasing

RolandGruber

Depends on hardware but max 5 levels is a good start

Jan Stettler

Do you have all CF models in one place? /config/global Did you also realize problems while splitting cf-models between different /config pathes?

RolandGruber

We had them in two locations. One problem is that you cannot reference from one config to another

Iryna

Did the authors use the AEM to add/edit the CFs or requested the Excel importer? =)

RolandGruber

There were multiple import and export possibilities :)

Pattabhi

Is it possible to create indexes for persistent queries?

Arko

Response from persisted queries get cached already.

Henry Kuijpers

Why was this the first decision that was made about this project? Why was a RDBMS (with spatial support) or Elastic Search or similar, not an option?

RolandGruber

I joined when the project was already running. As said, AEM might not be the perfect choice for this use case

tneves

Have you recovered from that project already? Did you get any medical issues because of it?

Radu Cotescu

Roland looks fine. He's either very resilient or the German healthcare system works really well! Haha. 😂

RolandGruber

True, the project was very stressful and sometimes you are just happy to get a different project :)