Integrate, grow, repeat. How WebSight real-time DXP provides a way to break out of the content-centric model
The presentation will offer an analysis of how WebSight DXP event-driven architecture can be used to create a more agile and responsive digital experience platform.
We will showcase an event-driven WebSight DXP architecture, and demonstrate how CMS can be used as a data source for a composable platform that processes events in real time.
The live demo will focus on the critical capabilities the event-driven architectures bring into DXP business scenarios:
- Increase in digital presence consistency, by keeping all data required for building experiences locally in DXP (the push model)
- Decrease in the number of problems generated by external data sources availability
- Increase in scalability and performance, due to the nature of event-based communication
By the end of our presentation, participants will have a general understanding of how the event-driven architecture can be utilized in the DXP context.
This really depends and there are at least a couple of valid solutions since the composing experience is made by small functions. The functions we were presented during the demo are configurable, and we can define which template is compatible with a particular (versioned) model and type of the data (products/reviews/prices), e.g. a template that is identified by the path `/templates/product-details-template.html` will use data of type and versions: pimProduct.v1, reviews.v1 and prices.v1.
Pipelines are built out of functions. Functions subscribes to the topics to read the events from and (optionally) produces new events. The first function we've created is the Router, which is responsible for re-writing each event type (page update, data update, template update etc) to the type-based topic. This way we know how to inform downstream functions about the event, which is relevant to the function. We'll have a diagrams with pipelines from demo on the playground session.
Could you tell me more about what you mean by snapshots? I’m not sure if I understand it correctly. Pushing all the data once again is okay, there are techniques that enable filtering out if the data is already in the system. Also, another strategy can be used, like keeping only the latest version of each product’s data in the system. Both have trade-offs.
There are a lot of techniques on how to retrieve data from non-event based systems. It widely depends on the use case. Some options: - Extend the source system with a module, that will produce the events on change - Create a module, that periodically retrieves the data from the external system - Use low-level connectors that comes with the event-streaming platforms - In the word case scenario, processing the batches
If you release a new code release, does that mean that all content is statically regenerated? Or do you detect which template changed?
It depends. If there is a moderate number of products/templates (let’s say less than 1 million), it usually does not make sense to add sophisticated algorithms that detect a change, because processing 1 million pages is less than a couple of minutes… But, of course, event-driven is made for workflows and there is absolutely nothing wrong in placing some change-detection before the experience templating function, to limit the number of compositions.
It's also recommended to use multiple templates. and only notify the platform about the templates that changed. We can detect if the template changed in between publications, this would be a nice optimization (i.e. to check the hash of the template before triggering the updates). We haven't facet the problem since the templates generation is fast (we haven't measured the templating engine module alone), but as you seen in the presentation publishing 3500 products takes low seconds.
How do you sync your data between all HTTP servers, when autoscaling kicks in (new instance should be added)? If you clone an instance, there could be some content updates happening in the meantime. Do you create a queue where the updates are gathered, before the instance is ready and then push/pull it from the queue to the HTTP server or a complete different solution?
Generally, it all depends on scale. On a relatively small scale (less than 100k unique pages) we use the simplest method and the synchronisation is made “at-place”, meaning with the event-streaming platform. The new HTTP server syncs the latest version of each unique experience, and it takes about 1-2 minutes. On a bigger scale (over 1M unique pages) you may use some snapshot techniques if the sync time (~10-15 min) is too long.
That would require an additional queue for the delta between the snapshot and “now”, and this could be achieved by using the previous event-streaming technique. The trigger of the autoscaling mechanics can be configured. By default, it would be some quantity metric (e.g. number of page views per the HTTP server instance), but resource-based metric (like CPU/network usage) would be also fine here.
Is this an open source solution? Building and maintaining your solution probably takes a lot of effort right? For example how are you handling security issues, bug fixes, releases / updates? Is there already a community behind it?
The real-time composability we presented is not an open-source solution. However, it is built on top of Apache open-source projects. WebSight CMS is a separate project, which is available under BSL license, in short words it is free to play with and small commercial use cases. We already have a community behind the CMS and we just started building community around the real-time composability.
For the releasing - we follow the Semantic Versioning principles and we tend to release small and frequent updates. At the moment, the product is in an early stage, as mentioned above, we have just started building a community around it. Feel free to contact us to get more details on how to participate in an early-preview phase.
Is the templating language django templating language?
It looks like HTL - they run on Sling. Confirmed - see https://docs.websight.io/cms/developers/components/#rendering-script.
Hi, the templating engine we adapted for the demo purposes is Pebble (https://pebbletemplates.io/). However, this is a small experience function and any templating engine (or a combination of a couple of them if you really need this) may be applied in a manner of a workflow or even a single (slightly bigger) function.
Radu - the HTL is done on the CMS, which is a separate product. Our CMS is based on the Apache Sling stack. We are still working on public documentation for the real-time composition product. Thank you.
How are the experiences Stored? Database, cloud based, Filesystem based. Locally near to the Webserver?
That depends on the particular component of the system. There is great flexibility when your services are done in a micro-service fashion, each of them can have their own database. However, the general rule we use here is event-sourcing. At the moment we are using the Apache Pulsar under the hood which has event-sourcing capabilities (think of it as an event-dedicated database). HTTP Servers can keep the experiences on their local hard drive (preferably some kind of SSD to optimise the I/O).
Do authors need to write a template language in their components? Are you only supporting a technical authoring audience?
We made the template language visible during the presentation on purpose, to let the audience know how it works. What do components presents fully depends on the component developers. They can hide all the templating language from the CMS authors. On the roadmap of our CMS (https://github.com/orgs/websight-io/projects/2/views/2), we have an item for author-friendly components that would enable drag-and-drop of template components with preview.
Can the client choose in which cloud the solution is deployed or is already provided?
Yes, the composability layer is based on the cloud-native stack and it is built to run with Kubernetes. Most of the cloud providers have a managed Kubernetes service which is perfectly fine to run the solution.
Is it using Apache Kafka under the hood?
There is an abstraction on the messaging provider. At the moment we are using Apache Pulsar (it was also used during the demo). Apache Kafka could be a fine replacement, however Apache Pulsar is cloud-native and scales more smoothly than Apache Kafka.
Pulsar is cloud-native, it's designed to use K8S by default. It's a good match.