Bringing intelligence on content in AEM with Content AI

This session will dive into Adobe Experience Manager's (AEM) groundbreaking Content AI initiative, which leverages the rich customer content in AEM to power next-generation content intelligence and agentic workflows.

We will explore how Content AI is leveraged to power innovations such as semantic search leveraged in several agentic use-cases, generative search capabilities and automatic content variation generation.

The talk will walk through the evolution of Content AI, starting with its conceptualization and continuing through its architecture. Key topics will include search-oriented schema design, async enrichment pipelines, and A/B testing capabilities. Additionally, we will discuss how these features are built on top of Content AI APIs, enabling an optimized and flexible search ecosystem that can scale across diverse use cases.

Join us as we explore how Content AI is reshaping search in AEM, offering developers, content managers, and enterprises new ways to unlock the full potential of their digital experiences.

Robert Wunsch

Is the RAG data used by "Content AI" permission sensitive (only using the data a user is able to see within AEM)?

(see answer in talk video)

Robert Wunsch

Will AEMaaCS customers be able to expose "Content AI" with semantic search on Published content to website users as "page search", and as public API searches (on Published content)?

(see answer in talk video)

Tad

Is ingestion of indexes in Edge Delivery to enrich search a type of use case you're planning on handling?

fabrizio

Yes, this is in the works but we don't have any date yet.

ashrvt

How deterministic your answer will be & how are you re-ranking your chunks ?

fabrizio

AI Search is deterministic by design. Re-ranking of chunks depends by the query you run. Pure vector queries use cosine similarity but you can also decide to run hybrid queries as well (fulltext + vector). When it comes to AI Answer things are different. LLMs might produce different results across invocations.

Javier Reyes

Will this semantic search also available in the Asset Selector for searching images to be chosen in page editor for example?

Tad

Or not just the page editor - it would be amazing to have this on the Assets MFE that's used in DA.

fabrizio

This is something we are working on. In general, semantic search will be available through JCR queries and you will be able to use it directly (documentation on this is in the works).

Krzysztof

Which Vector DB implementation are you using and why?

fabrizio

We use Elastic as vector database. It was initially introduced for scalability reasons. A big advantage is the full compatibility (except for vector queries) between Lucene and Elastic. All the capabilities implemented over the past years can be reused drastically reducing complexity.

sabdouni

For the embeddings, is it using VLLM or meta-data linked to the assets?

fabrizio

It currently relies on assets metadata. We are also working with the Assets team on initiatives to improve the metadata's quality.

Iryna

and all this magic will be included in the AEMaaC subscription, right? =)

Carlos NN

fabrizio

Content AI comes with AEMCS for the AI Search part - AI Answers pricing and packaging are currently in the works

Ive

Why did you select Elastic ? OOTB Lucene also supports vector DB

Robert Wunsch

From what I know, we tried to offload indexing from the cluster pods making the authoring layer. This is where the deduction for elastic search was taken. This is being used more and more within AEMaaCS for all indexes. The OOTB license indexes would need to be brought into the pods and run there, and if the pods are rotated frequently (which they often are), that would also long. Reducing the "time to system ready" was a big part of this decision, afaik.

fabrizio

Correct, Elastic was initially introduced for scalability reasons. Author and publish instances don't have to download indexes anymore when indexes are remote. An important thing to notice here is that not all indexes are remote. For some indexes it makes a lot of sense to stay in Lucene (especially internal ones).

sabdouni

Do we have a date for GA?

Nitin

For Semantic search in Assets UI - it is already being rolled out to various Assets customers and would be GA very soon. For AI Answers - we are currently in limited availability and actively working towards a GA (but the exact date is not yet decided.)

wolf

What languages does it support? The demo shows English and German being automatically supproted, but what about others?

Nitin

The current model that was showcased in the demo and being used for enrichment atm is paraphrase-multilingual-MiniLM-L12-v2 The link here https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 can be referred to see a subset of languages it supports (the actual number is around 100).

Karolis

Can you bypass the guard rails by pretending to be a researcher or providing educational reasons when searching for things like “ddos attack”?

Nitin

No, the guardrails would filter that out - we have an internal framework that ensures the guardrails in place are effective and would filter out on the above mentioned use case as well.

Helge

That’s nice! How is the embedding done via indexes and can we also return the actual pages that fit best / that have been used to create the answer?

Nitin

Yes, in AI Answers, there's a feature to enable and show the pages/links that were referred to form the context that has been used to create the response.