AEM Infrastructure
AEM is a Content Management System (CMS) that is used to create and deploy commercial websites. It uses a structure that allows users to control the exact content that is available on the public site, easily manipulate content on a site, and cache content for faster delivery.
AEM is a very powerful tool, but in being a powerful tool there is a large degree of complexity associated with it. This document covers basics of the actual structure of an AEM setup, including author and publish instance, dispatcher and load balancer.
Typical AEM Setup
The number of AEM instances can vary from environment to environment, but a full AEM setup is comprised of 1 Author instance, 1 Publish instance, 1 Dispatcher, and 1 Load Balancer. Each plays their role in controlling and accessing the content created by users. In many instances, especially production, there are multiple publish instances and dispatchers to allow for redundancy, increased performance, and increased traffic to the site.
Author
AEM Author instances are generally hosted on a remote server and run on port 4502. The remote server on which the instance lives is set up behind an internal firewall and usually requires VPN access. Author instances are where users will create content and pass the content to the Publishers via activation. The author instance is also the environment where users will create reports on activity on the author instance, upload DAM assets, configure components, and perform any other actions that the end user of the website should not see.
The author instance controls what content is published. That means that users will activate content when they want it to be public facing and deactivate when they want to take it down. The way this is done is using replication agents. You can see the full list of replication agents on the author instance by going to /etc/replication/agents.author.html. Configuring the replication agents can be done by selecting the agent you want to edit, opening it, and editing the settings bar on the page. Creating new replication agents is done from the Tool Admin page at /miscadmin#/etc/replication. Replication agents themselves are configured with a URL and credentials to the publish instance. The author instance sends the content it wants to publish to this URL (/bin/receive) and the publish instance ingests the content and updates/creates the pages/assets accordingly. Below is a screenshot of the agents on author page and the top 3 most used (and available by default) replication agents.
When testing code locally it is generally done on a local author instance. This is fine but when testing styles on an author instance you should append “?wcmmode=disabled” to the URL you’re viewing because it gives you the closest experience to what the end user will see, but it is necessary to test your changes on a publish instance as well (generally done on a dev server and not local).
Publish
The publish instance of AEM is where the published content resides. It is where the dispatchers pull content from to display to the user. The publish instance receives content from the author instance when an author has activated some content. Publish instances are similar to author instances other than the actual authoring capability. All the same tools used on the author instance are available on the publish instance but are generally not used. If there is need to look at content in the JCR of the publish instance, you can navigate to CRXDE directly (crx/de/index.jsp) and log in there to gain access to all the other tools. Below is a screenshot of the login button found in the top right of CRXDE.
Also, similar to author instances, code needs to be built to the publishers directly as well so that the user facing site has access to it. This is generally done via Jenkins and there will be specific Jenkins jobs associated with the publishers. It’s exactly the same process as building code to an author instance, just the destination is different.
Dispatcher
The dispatcher is an Apache module installed on a server that enables caching of certain pages and assets. What is cached is fully configurable in the dispatcher specific configurations on the server. Caching pages speeds up the delivery of the content to the end user, and with the dispatcher configuration it’s possible to disallow caching on certain pages/paths if there is some security reason or if the pages house personalized content.
The dispatcher also acts as a load balancer to the various publishers associated with the AEM infrastructure. The ‘renders’ as they are called are located in the dispatcher config (dispatcher.any) and a sample render configuration can be seen below. A handy trick to know which publish instance your current content is coming from is by checking the ‘renderid’ cookie set by the dispatcher. In this instance, based on the names of the renders, the cookie value would be either ‘rend01’, ‘rend02’, or ‘rend03’.
Another feature of the dispatcher is setting up request rules to disallow content grabbing (i.e. appending ‘.infinity.json’ to a URL to get the JSON representation of the page stored in the JCR), or allowing requests to certain paths (i.e. servlet paths). Inside the dispatcher configuration is also a section to manage the caching of the content. This is generally set up in a ‘cache everything except these pages’ to allow for maximum caching and therefore page load speeds. Pages that are usually not cached are pages that have some sort of personalized content on them, gated assets, and pages under the ‘user’ directory (login, registration, etc). Something to note is that pages with URL parameters are never cached.
Since the dispatcher is just an Apache module, there are still configurations of the actual Apache server that take place when setting up the dispatcher. These include setting up virtualhosts, rewrite files, and any other Apache modules desired to load on the server. Below is a sample dispatcher module configuration inside the Apache settings (httpd.conf) file. There are various flags you can set on the dispatcher module, these are just the most generally used ones.
Load Balancer
The load balancer is not actually an AEM specific piece. It is what allows the user to hit the end URL (i.e. www.bounteous.com instead of disp01.bounteous.com). Essentially all it does is it balances the load between the dispatchers, which then balance the load to the publishers – giving the end user the fastest experience possible by adding another level of abstraction.
As stated earlier, AEM is a very powerful but also complex tool. We hope that this post has helped you to understand AEM’s infrastructure better. If you want to learn more about deployment and maintenance of AEM, check out the documentation from Adobe. If you want to learn more about AEM in general, take a peak at our page about Adobe Experience Manager.