0 comments on “Sitecore Azure Search: Top 10 Tips”

Sitecore Azure Search: Top 10 Tips

Its been a while since I first wrote about Azure Search and we have a few more tips and tricks on how to optimise Azure Search implementations.

Before proceeding if you missed our previous posts check out some tools we created for Azure Search Helix setup and Geo-Spatial Searching.

Also, check out the slides from our presentation at last years Melbourne Sitecore User Group.

Ok let us jump into the top 10 tips:

Tip 1) Create custom indexes for targeted searching

The default out of the box indexes will attempt to cover just about everything in your Sitecore databases. They do so to support Sitecore CMS UI searches out of the box.  It’s not a problem if you want to use the default indexes (web, master) to search with, however for optimal searches and faster re-indexing time a custom index will help performance.

By stepping back and looking at the different search requirements across the site you can map out your custom indexes and the data that each will require.

Consider also that if the custom indexes need to be used across multiple Feature Helix modules the configuration files and search repositories may need to live in an appropriate Foundation module. More about feature vs foundation can be found here.

Tip 2) Keep your indexes lean

This tip follows on from the first Tip.

Essentially the default Azure Search configuration out of the box will have:

<indexAllFields>true</indexAllFields>

This can include a lot of fields and your probably not going to need every single Sitecore field in order to present the user with meaningful data on the front end interfaces.

The other option is to specify only the fields that you need in your indexes:

<include hint="list:IncludeField"> 
<Text>{A60ACD61-A6DB-4182-8329-C957982CEC74}</Text> 
</include>

The end result will limit the amount of JSON payload that needs to be sent across the network and also the amount of payload that the Sitecore Azure Search Provider needs to process.

Particularly if you are returning thousands of search results you can see what happens when “IndexAllFields” is on via Fiddler.

This screenshot is via a local development machine and Azure Search instance at the Microsoft hosting centre.

Fiddler Index

JSONFIelds

  • So for a single query “IndexAllFields” can result in:
    • 2 MB plus JSON payload size.
    • Document results with all Sitecore metadata included. That could be around 100 fields.

If your query results in Document counts in the thousands obviously the payload will grow rapidly. By reducing the fields in your indexes (removing un-necessary data)  you can speed up query, transfer and processing times and get the data displayed quicker.

Tip 3) Make use of direct azure connections

Sitecore has done a lot of the heavy lifting for you in the Sitecore Azure Search Provider. It’s a bit like a wrapper that does all the hard work for you. In some cases however you may find that writing your own queries that connect via the Azure Search DLL gives you better performance.

Tip 4) Monitor performance via Azure Search Portal

It’s really important to monitor your Azure Search Instance via Azure Portal. This will give you critical clues as to whether your scaling settings are appropriate.

In particular look out for high latency times as this will indicate that your search queries are getting throttled. As a result, you may need to scale up your Azure Search Instance.

In order to monitor your latency times go to:

  1. Login to Azure Portal
  2. Navigate to your Azure Search Instance.
  3. Click on metrics in the left-hand navigation
    • metrics
  4. Select the “Search Latency” checkbox and scan over the last week.
    • graph
  5. You will see some peaks these usually indicate heavy periods of re-indexing. During re-indexing, the Azure Search instance is under heavy load. As long as your peaks under 0.5-second mark your ok.  If you see Search Latency up into the 2-second timeframe you probably need to either adjust how your indexes are used (caching and re-indexing) or scale up to avoid the flow on effects of slow search.

Tip 5) Cache Wrappers

In the code that uses Azure Search, it would be advisable to use cache wrappers around the searches when possible. For your most common searches, this should prevent Azure Search getting hit repeatedly with the same query.

For a full example of cache wrapper checkout the section titled Sitecore.Caching.CustomCache in my previous blog post.

Tip 6) Disable Indexing on CD

This is a hot tip that we got from Sitecore Support when we started to encounter high search latency during re-indexing.

Most likely in your production setup, you will have a single Azure Search instance shared between CM and CD environments.

You need to factor in that CM should be the server that controls the re-indexing (writing) and CD will most likely be the server doing the queries (reading).

Re-indexing is triggered via the event queue and every server subscribes and reacts to these events. Each server with the out of the box search configuration will cause the Azure Search indexes to be updated.  In a shared Azure Search (or SOLR instance) this only needs to be updated by a single server. Each additional re-index is overkill and just doubling up on re-indexing workload.

You can, therefore, adjust the configuration on the CD servers so that it does not cause re-indexing to happen.

The trick is in your index configuration files to use Configuration Roles to specify the indexing strategy on each server.

 <strategies hint="list:AddStrategy">
 <!--
 NOTE: order of these is controls the execution order 
 -->
 <strategy role:require="Standalone OR ContentManagement" ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync"/>
 <strategy role:require="ContentDelivery" ref="contentSearch/indexConfigurations/indexUpdateStrategies/manual"/>
 </strategies>

Setting the index update strategy to manual on your CD servers will take a big load off your remote indexes.

Particularly if you have multiple CD servers using the same indexes. Each additional CD server would cause additional updates to the index without the above setting.

Tip 7) Rigid Indexes – Have a deployment plan

If your deployment includes additions and changes to the indexes and you need 100% availability of search data, a deployment plan for re-indexing will be required.

Grant chatted about the problem in his post here. To get around this you could consider using the blue / green paradigm during deployments.

  • This would mean having a set blue indexes and a set of green indexes.
  • Using slot swaps for your deployments.
    • One slot points to green in configuration.
    • One slot (production) points to blue in configuration.
  • To save on costs you could decommission the staging slot between deployments.

Tip 8) HttpClient should be a singleton or static

The basic idea here is that you should keep the number of HttpClient instances in your code to an absolute minimum if you want optimal performance.

The Sitecore Azure Search provider actually spins up 2 x HttpClient connections for every single index. This in itself is not ideal and unfortunately, there is not a lot you can do about this code in the core product itself.

In your own connections to other APIs, however, HttpClient SendAsync is perfectly thread safe.

By using HttpClient singletons you stand to gain big in the performance stakes. One great blog article worth reading runs you through the performance benefits. 

It’s also worth noting that in the Azure Search documentation Microsoft themselves say you should treat HttpClient as a singleton.

Tip 9) Monitor your resources

In Azure web apps you have finite resources with your app server plans. Opening multiple connections with HttpClient and not disposing of them properly can have severe consequences.

For instance, we found a bug in the core Sitecore product that was caused by the connection retryer. It held open ports forever whenever we hit out Azure Search plan usage limits.  The result was that we hit outbound open connection limits for sockets and this caused our Sitecore instance to ground to a slow halt.

Sitecore has since resolved the issue mentioned above after a lengthy investigation working alongside the Aceik team. This was tracked under reference number 203909.

To monitor the number of sockets in Azure we found a nice page on the MSDN site.

Tip 10) Make use of OData Expressions

This tip relates strongly to tip 3.  Azure search has some really powerful OData Expressions that you can make use of by a direct connection.  Once you have had a play with direct connections it is surprisingly easy to spin up really fast queries.

Operators include:

  • OrderBy, Filter (by field), Search
  • Logical operators (and, or, not).
  • Comparison expressions (eq, ne, gt, lt, ge, le).
  • any with no parameters. This tests whether a field of type Collection(Edm.String) contains any elements.
  • any and all with limited lambda expression support.
  • Geospatial functions geo.distance and geo.intersects. The geo.distance function returns the distance in kilometres between two points.

See the complete list here.


 

Q&A

Q) Anything on multiple region setups? Or latency considerations?

A) Multi-region setups:   Although I can’t comment from experience the configuration documentation does state that you can specify multiple Azure Search instances using a pipe separator in the connection string.

<add name="cloud.search" connectionString="serviceUrl=https://searchservice1.search.windows.net;apiVersion=2015-02-28;apiKey=AdminKey1|serviceUrl=https://searchservice2.search.windows.net;apiVersion=2015-02-28;apiKey=AdminKey2" /> 

Unfortunately, the documentation does not go into much detail. It simply states that “Sitecore supports a Search service with geo-replicated scenarios” which one would hope means under the hood it has all the smarts to take care of this.

I’m curious about this as well and opened a stack overflow ticket. Let’s see if anyone else in the community can answer this for us.

Search Latency: 

Search latency can be directly improved by adding more replicas via the scaling setting in Azure Portal

replicas

Two replicas should be your starting point for an Azure Search instance to support Sitecore. Once you launch your site you will need to follow the instruction in tip 4 above monitor search latency.  If the latency graph is showing consistent spikes and high latency times above 0.5 seconds it’s probably time to add some more replicas.

0 comments on “Using a bucket as a datasource”

Using a bucket as a datasource

download (1)A common scenario requirement these days is to enable content editors to select an item that resides in a Sitecore bucket. The simplest solution is to use a Sitecore multilist with search field.

Another common requirement is to only allow the editor to select one item. In this case we dont have a drop list with search field so again the simplest solution is to use the Sitecore multilist with search field.

First you need to set the source of your field to display items within your bucket of a specific template(s).

Example: StartSearchLocation={11111111-1111-1111-1111-111111111111}&Filter=+_templatename:sample item

Here is an article describing how to set the source of your field: Sitecore 7 field types

If you need to limit the selection to one item for example then you need to also apply some regex. To achieve this you need to enable standards values in the view tab so you can alter the data section.

In the data section add the following regex: ^({[^}]+}\|?){0,1}$ and add some validation text.

Example:

Lqmqh
This article provides additional information:Limit selected items on Sitecore multilist field

0 comments on “Automate friendly field descriptions”

Automate friendly field descriptions

58845601

As you may well know when creating and naming fields in Sitecore we also need to create a friendly title for the content editors. By default Sitecore doesn’t make this as easy as it could be and therefore there are a number of naming conventions and solutions for this simple but fundamental issue.

I have always taken the approach of camel casing my field names. It makes sense from a coding perspective. It is a good practice as most developers understand this convention and its benefits. Also when integrating with ORMs such as glass a camel case field name will map directly to the property name of the class without additional mapping attributes.

The obvious issue with using camel case field names is that the default value the content editors see is the field name. Camel case is great for developers but not as good for content editors, so we need to manually write a friendly display description for the editors. This leads to double entry of field name and field description and then keeping this consistent.

Solution:

Auto generate a friendly field description based on the field name by splitting the camel case and adding spaces. For additional information sometimes required we use the help text option.

Example:

Field Name:

PageTitle

Generated Field Description:

PageTitle1

Addition information:

PageTitle2

Implementation:

Tap into the item on saving event and add some basic code to auto generate a display title based on the field name.

namespace Aceik.Framework.SC.Extensions.Events
{
    using System;
    using System.Text.RegularExpressions;

    using Sitecore.Data;
    using Sitecore.Data.Items;
    using Sitecore.Events;

    /// <summary>The item on saving events.</summary>
    public class ItemOnSavingEvents
    {
        /// <summary>The field title name.</summary>
        private const string FieldNameTitle = "Title";

        /// <summary>The field template id.</summary>
        private readonly ID fieldTemplateId = new ID("{455A3E98-A627-4B40-8035-E683A0331AC7}");

        /// <summary>The on item save.</summary>
        /// <param name="sender">The sender.</param>
        /// <param name="args">The args.</param>
        public void OnItemSaving(object sender, EventArgs args)
        {
            var contextItem = Event.ExtractParameter(args, 0) as Item;

            if (contextItem == null)
            {
                return;
            }

            if (contextItem.TemplateID == this.fieldTemplateId && contextItem.Fields[FieldNameTitle] != null)
            {
                contextItem.Fields[FieldNameTitle].SetValue(SplitCamelCase(contextItem.Name), false);
            }
        }

        /// <summary>The split camel case.</summary>
        /// <param name="input">The input.</param>
        /// <returns>The <see cref="string"/>.</returns>
        private static string SplitCamelCase(string input)
        {
            return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
        }
    }
}

Patch in the following config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
  <events timingLevel="custom">
      <event name="item:saving">
        <handler patch:instead="*[@type='Sitecore.Tasks.ItemEventHandler, Sitecore.Kernel']" type="Aceik.Framework.SC.Extensions.Events.ItemOnSavingEvents, Aceik.Framework.SC.Extensions" method="OnItemSaving"/>
      </event>
    </events>
  </sitecore>
</configuration>

As the description is being auto generated it restricts your ability to add a custom description, however you can add additional information to the display title by adding help text. This blog explains it well: http://firebreaksice.com/sitecore-item-and-field-names/