Its been a while since I first wrote about Azure Search and we have a few more tips and tricks on how to optimise Azure Search implementations.
Before proceeding if you missed our previous posts check out some tools we created for Azure Search Helix setup and Geo-Spatial Searching.
Also, check out the slides from our presentation at last years Melbourne Sitecore User Group.
Ok let us jump into the top 10 tips:
Tip 1) Create custom indexes for targeted searching
The default out of the box indexes will attempt to cover just about everything in your Sitecore databases. They do so to support Sitecore CMS UI searches out of the box. It’s not a problem if you want to use the default indexes (web, master) to search with, however for optimal searches and faster re-indexing time a custom index will help performance.
By stepping back and looking at the different search requirements across the site you can map out your custom indexes and the data that each will require.
Consider also that if the custom indexes need to be used across multiple Feature Helix modules the configuration files and search repositories may need to live in an appropriate Foundation module. More about feature vs foundation can be found here.
Tip 2) Keep your indexes lean
This tip follows on from the first Tip.
Essentially the default Azure Search configuration out of the box will have:
This can include a lot of fields and your probably not going to need every single Sitecore field in order to present the user with meaningful data on the front end interfaces.
The other option is to specify only the fields that you need in your indexes:
The end result will limit the amount of JSON payload that needs to be sent across the network and also the amount of payload that the Sitecore Azure Search Provider needs to process.
Particularly if you are returning thousands of search results you can see what happens when “IndexAllFields” is on via Fiddler.
This screenshot is via a local development machine and Azure Search instance at the Microsoft hosting centre.
- So for a single query “IndexAllFields” can result in:
- 2 MB plus JSON payload size.
- Document results with all Sitecore metadata included. That could be around 100 fields.
If your query results in Document counts in the thousands obviously the payload will grow rapidly. By reducing the fields in your indexes (removing un-necessary data) you can speed up query, transfer and processing times and get the data displayed quicker.
Tip 3) Make use of direct azure connections
Sitecore has done a lot of the heavy lifting for you in the Sitecore Azure Search Provider. It’s a bit like a wrapper that does all the hard work for you. In some cases however you may find that writing your own queries that connect via the Azure Search DLL gives you better performance.
Tip 4) Monitor performance via Azure Search Portal
It’s really important to monitor your Azure Search Instance via Azure Portal. This will give you critical clues as to whether your scaling settings are appropriate.
In particular look out for high latency times as this will indicate that your search queries are getting throttled. As a result, you may need to scale up your Azure Search Instance.
In order to monitor your latency times go to:
- Login to Azure Portal
- Navigate to your Azure Search Instance.
- Click on metrics in the left-hand navigation
- Select the “Search Latency” checkbox and scan over the last week.
- You will see some peaks these usually indicate heavy periods of re-indexing. During re-indexing, the Azure Search instance is under heavy load. As long as your peaks under 0.5-second mark your ok. If you see Search Latency up into the 2-second timeframe you probably need to either adjust how your indexes are used (caching and re-indexing) or scale up to avoid the flow on effects of slow search.
Tip 5) Cache Wrappers
In the code that uses Azure Search, it would be advisable to use cache wrappers around the searches when possible. For your most common searches, this should prevent Azure Search getting hit repeatedly with the same query.
For a full example of cache wrapper checkout the section titled Sitecore.Caching.CustomCache in my previous blog post.
Tip 6) Disable Indexing on CD
This is a hot tip that we got from Sitecore Support when we started to encounter high search latency during re-indexing.
Most likely in your production setup, you will have a single Azure Search instance shared between CM and CD environments.
You need to factor in that CM should be the server that controls the re-indexing (writing) and CD will most likely be the server doing the queries (reading).
Re-indexing is triggered via the event queue and every server subscribes and reacts to these events. Each server with the out of the box search configuration will cause the Azure Search indexes to be updated. In a shared Azure Search (or SOLR instance) this only needs to be updated by a single server. Each additional re-index is overkill and just doubling up on re-indexing workload.
You can, therefore, adjust the configuration on the CD servers so that it does not cause re-indexing to happen.
The trick is in your index configuration files to use Configuration Roles to specify the indexing strategy on each server.
NOTE: order of these is controls the execution order
<strategy role:require="Standalone OR ContentManagement" ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync"/>
<strategy role:require="ContentDelivery" ref="contentSearch/indexConfigurations/indexUpdateStrategies/manual"/>
Setting the index update strategy to manual on your CD servers will take a big load off your remote indexes.
Particularly if you have multiple CD servers using the same indexes. Each additional CD server would cause additional updates to the index without the above setting.
Tip 7) Rigid Indexes – Have a deployment plan
If your deployment includes additions and changes to the indexes and you need 100% availability of search data, a deployment plan for re-indexing will be required.
Grant chatted about the problem in his post here. To get around this you could consider using the blue / green paradigm during deployments.
- This would mean having a set blue indexes and a set of green indexes.
- Using slot swaps for your deployments.
- One slot points to green in configuration.
- One slot (production) points to blue in configuration.
- To save on costs you could decommission the staging slot between deployments.
Tip 8) HttpClient should be a singleton or static
The basic idea here is that you should keep the number of HttpClient instances in your code to an absolute minimum if you want optimal performance.
The Sitecore Azure Search provider actually spins up 2 x HttpClient connections for every single index. This in itself is not ideal and unfortunately, there is not a lot you can do about this code in the core product itself.
In your own connections to other APIs, however, HttpClient SendAsync is perfectly thread safe.
By using HttpClient singletons you stand to gain big in the performance stakes. One great blog article worth reading runs you through the performance benefits.
It’s also worth noting that in the Azure Search documentation Microsoft themselves say you should treat HttpClient as a singleton.
Tip 9) Monitor your resources
In Azure web apps you have finite resources with your app server plans. Opening multiple connections with HttpClient and not disposing of them properly can have severe consequences.
For instance, we found a bug in the core Sitecore product that was caused by the connection retryer. It held open ports forever whenever we hit out Azure Search plan usage limits. The result was that we hit outbound open connection limits for sockets and this caused our Sitecore instance to ground to a slow halt.
Sitecore has since resolved the issue mentioned above after a lengthy investigation working alongside the Aceik team. This was tracked under reference number 203909.
To monitor the number of sockets in Azure we found a nice page on the MSDN site.
Tip 10) Make use of OData Expressions
This tip relates strongly to tip 3. Azure search has some really powerful OData Expressions that you can make use of by a direct connection. Once you have had a play with direct connections it is surprisingly easy to spin up really fast queries.
- OrderBy, Filter (by field), Search
- Logical operators (and, or, not).
- Comparison expressions (
eq, ne, gt, lt, ge, le).
any with no parameters. This tests whether a field of type
Collection(Edm.String) contains any elements.
all with limited lambda expression support.
- Geospatial functions
geo.distance function returns the distance in kilometres between two points.
See the complete list here.
Q) Anything on multiple region setups? Or latency considerations?
A) Multi-region setups: Although I can’t comment from experience the configuration documentation does state that you can specify multiple Azure Search instances using a pipe separator in the connection string.
<add name="cloud.search" connectionString="serviceUrl=https://searchservice1.search.windows.net;apiVersion=2015-02-28;apiKey=AdminKey1|serviceUrl=https://searchservice2.search.windows.net;apiVersion=2015-02-28;apiKey=AdminKey2" />
Unfortunately, the documentation does not go into much detail. It simply states that “Sitecore supports a Search service with geo-replicated scenarios” which one would hope means under the hood it has all the smarts to take care of this.
I’m curious about this as well and opened a stack overflow ticket. Let’s see if anyone else in the community can answer this for us.
Search latency can be directly improved by adding more replicas via the scaling setting in Azure Portal
Two replicas should be your starting point for an Azure Search instance to support Sitecore. Once you launch your site you will need to follow the instruction in tip 4 above monitor search latency. If the latency graph is showing consistent spikes and high latency times above 0.5 seconds it’s probably time to add some more replicas.