Indexes

Introduction

Casewhere employs MongoDB for its data storage. By default, MongoDB will go through all the documents to select the documents that match the query filter criteria, which is not ideal for dealing with large data collections. With proper indexes, MongoDB can narrow down the documents it must scan, thereby speeding up queries.

This guide provides a complete overview of setting up effective indexes in MongoDB for the Casewhere platform. You will learn:

The importance of indexes in optimizing database queries.
Step-by-step instructions for configuring and managing indexes in Casewhere.
Best practices for maintaining indexes to ensure long-term efficiency.

Create indexes

In the data class editor, go to the Indexes tab to manage the indexes for the data class. Click Add to create a new index. The index editor appears as follows:

The following fields are needed to create an index:

Index Name: Index names must be unique. See here for popular index name conventions.
Index Definition: Specify the index fields and their sorting order. The order of these fields is crucial as it impacts how the query planner utilizes the index.
Custom Collation:
- If no custom collation is specified, the index will be created with the default collation. The default collation is configurable per Casewhere deployment.
- You can create an index without specifying collation by setting Locale to 'None'. Non-collation indexes generally offer better performance, but they do not support case-insensitive text filtering, such as filtering for users with an address containing 'virum'.
  - Non-collation indexes are highly useful for aggregation queries on large data collections, particularly when grouping and counting are required.
Options:
- Build Index In The Background: Building an index can be time-consuming, so Casewhere shouldn't wait for the process to complete. It's recommended to use the default setting here.
- Create Unique Index: A unique index ensures that the indexed fields do not store duplicate values.
- Create TTL: TTL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time. Data expiration is useful for certain types of information like machine generated event data, logs, and session information that only need to persist in a database for a finite amount of time.

To demonstrate how MongoDB indexes work in Casewhere, we'll use the following data class Application as an example.

Name	Type
ApplicationID	String
SubmittedDate	DateTime
Status	Enum
ApplicantId	ForeignKey
ApplicantCpr	String, Encrypted
IsDeleted	Casewhere built-in field to determine if an object is deleted or not
CaseState	Casewhere built-in field to indicate the data object's creation state

Example 1: Find applications by ApplicationID

ApplicationID is a high-cardinality field since each application has a unique value. Although Casewhere automatically appends conditions for IsDeleted and CaseState for all queries by default, there’s no need to include these or any sorting criteria in the index, keeping it small and efficient.

Example 2: Filter by Status and sort by SubmittedDate in descending order

This is a common query used in most data widgets. Please note that Casewhere automatically appends conditions for IsDeleted and CaseState for all queries by default. You should stick to the ESR rule starting with high cardinality field first, in this case, the Status field.

Casewhere automatically appends the sorting condition {_id: 1} to ensure correct paging, so you should also include it in the index.

Example 3: Optimize aggregation queries with projection covered

For scenarios where aggregation operations, such as summing, needed to be done against large data collections, consider optimizing performance by ensuring PROJECTION_COVERED. This means retrieving all necessary data directly from the indexes, without the need for additional fetching.

For example, you are using the following query to filter applications submitted within a specific period and calculating the total amount for each application status.

db.getCollection("CWApplications").aggregate(
[
    {
        $match: {
            IsDeleted: false,
            CaseState: 'Created',
            SubmittedDate: {
                $gt: ISODate("2022-01-01 00:00:00.000Z"),
                $lt: ISODate("2024-01-01 00:00:00.000Z")
            }
        }
    },
    {
        $group: {
            _id: "$Status",
            TotalAmount: {
                $sum: "$ApplicationAmount"
            }
        }
    }
])

The query may run very slowly if the filter condition matches several million applications. To improve the performance, you can apply the following simple-collation index. This ensures that Status and ApplicationAmount are retrieved directly from the index, even when these fields are not part of the filter criteria.

{
    "SubmittedDate" : -1,
    "CaseState" : 1,
    "IsDeleted" : 1,
    "Status" : 1,
    "ApplicationAmount" : 1
}

Since IsDeleted and CaseState are low-cardinality fields, placing them at the beginning of the index won't have a significant impact in this case. Additionally, having too many indexes prefixed with these fields may cause MongoDB to evaluate numerous index plans unnecessarily.

Execute the query and you can observe that PROJECTION_COVERED has been achieved. Please note that your query must be non-collation.

To enable simple-collation queries in Casewhere, ensure your custom data source is set to simple collation by configuring Locale to None.

Best practices

Naming conventions

While MongoDB doesn’t enforce a specific naming convention for index names, it’s a good practice to adopt a clear and consistent format to improve readability and maintainability. Here are two common conventions:

General: Name the index based on the fields it includes, separated by underscores. When creating compound indexes, you could specify the order (asc or desc) in the name.
- Example: general_Status_SubmittedDate_desc. The index name should begin with the prefix general, and built-in fields such as CaseState and IsDeleted should be excluded.
Specific: In some cases, it’s useful to include the index’s purpose if it’s for a specific query.
- Example: specific_report_total_amount for an index created specifically to calculate total amount for reports. The index should be prefixed with specific.

The ESR rule

An index that references multiple fields is a compound index. Compound indexes can dramatically improve query response times. In most cases, applying the ESR (Equality, Sort, Range) rule to arrange the index keys helps to create a more efficient compound index. This page introduces the ESR Rule. Here are the key principles:

Equality: Prioritize equality conditions in your query, as MongoDB is most efficient when filtering documents based on equality. When designing your indexes, place fields with equality conditions at the beginning of the index, and ensure that high-cardinality fields come first.
Sort: Following the fields used for equality, include fields that are used in sort operations. This arrangement allows MongoDB to use the index both for filtering and for sorting the results, which is more efficient than sorting the data after filtering.
Range: Lastly, include fields with range queries (e.g., greater than, less than operations). Range queries are less selective compared to equality filters, so they are placed last in the index.

Example: Assuming you have the following query:

var filter = FilterBuilder.Create()
	.Gt("ApplicationAmount", 1000) // Range query
	.Eq("Status", "Submitted")
	.Eq("CaseState", "Created").Build();
var query = DataObjectApiQuery.For("Application")
    .FilterBy(filter)
    .OrderBy("SubmittedDate", true);
var result = ctx.Use<IDataApi>().Search(query);

Your compound index in this case should be ordered as below:

Status - Equality - High-cardiniality field comes first
IsDeleted - Equality
CaseState - Equality
SubmittedDate - Sort
_id - Sort - Casewhere always appends _id when sorting to ensure the order is consistent.
ApplicationAmount - Range

Query planner: Minimize the number of candidate plans

To evaluate the efficiency of query plans, the query planner runs all candidate plans during a trial period. In general, the winning plan is the query plan that produces the most results during the trial period while performing the least amount of work. If a query has too many candidate plans, MongoDB may take considerable time and resources to determine the optimal plan, which can lead to unpredictable slow response times.

For instance, most Casewhere data queries include the following filter criteria:

{
    IsDeleted: false,
    CaseState: 'Created'
}

If there are many indexes starting with these two fields, MongoDB must evaluate each one when updating the query plan, which can lead to significantly slower response times. To prevent this, it’s important to avoid creating indexes that begin with IsDeleted and CaseState.

Working with `$or`

When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans.

[
  {
    $match: {
      IsDeleted: false,
      CaseState: "Created",
      $or: [
        {
          SubmittedDate: {
            $gt: ISODate("2020-01-01 00:00:00.000Z"),
            $lt: ISODate("2024-01-01 00:00:00.000Z")
          }
        },
        {
          Status: 'Draft'
        }
      ]
      
    }
  }
]

To support this query, rather than a compound index, you would create one index on Status and another index on SubmittedDate:

The query plan will look as follows:

Remove duplicated indexes

Building and maintaining indexes are costly, so unused indexes should be removed to optimize performance. The index (A: 1, B: 1, C: 1) is redundant and can be omitted because it is fully encompassed by (A: 1, B: 1, C: 1, D: 1). However, it won't be completely covered by (A: 1, B: -1, C: 1, D: 1) due to the opposite sorting direction of B.

You can use the Index Manager to monitor index usage.