Group by's

images/download/attachments/169607365/image-2024-2-7_18-8-17-version-1-modificationdate-1707325697728-api-v2.png

Search types with output columns (CSV search, Tuple search) optionally support the definition of a list of Projections for grouping criteria.

Configured grouping criteria are usually mapped directly to the GROUP BY section in generated SELECT statements.

Group by's aim to subsequently summarize the primary match list of a search in such a way that only a single line is output in the final result for each unique combination of the Projections listed as grouping criteria. Aggregation functions (see Aggregate) can be used to summarize the data of the grouped data records.

  • The primary match list takes into account any Restrictions defined as a 'condition' and any Joins that may have a restrictive or multiplying effect on the primary match list.

  • Projections defined as grouping criteria do not have to be configured as output columns.

  • Conversely, the following strict rule applies to the definition of Projections for output columns or for Sorting:

    • A projection for an output column or a sorting criterion must be used as a 'projection' within an aggregation function (see Aggregate) if its configuration includes a property that is not also defined as a grouping criterion by Property projection.
      NOTE◄ The last example (see below) should illustrate what this restriction means for the practical handling of Group by's.


IMPORTANT◄ The 'Calculate total number' option must not be checked if more than one grouping criterion is configured under Group by's.


Configuration

The data model of the relevant Search types (CSV search, Tuple search) offers a list field Group by's (groupBy), which expects Projections as list values:

  • By default (see screenshot above), the Group by's list is empty.

  • A grouping criterion (projection configuration) can be added to the list by clicking on the images/s/-95e2zf/9012/8yg2g7/_/images/icons/emoticons/add.svg icon

  • An existing grouping criterion can be removed by clicking on the garbage can symbol.

  • If the context provides a tree structure for displaying the search, the order position of grouping criteria can be adjusted using drag & drop.


images/s/-95e2zf/9012/8yg2g7/_/images/icons/emoticons/warning.svg CAUTIONimages/s/-95e2zf/9012/8yg2g7/_/images/icons/emoticons/warning.svg The order of the grouping criteria generally has no influence on the results of the grouping and any existing aggregation functions. However, the order of the results can depend on the order of the grouping criteria and this can lead to differences in the content of the result in conjunction with a limitation of the output rows via the 'Max. results' parameter. This also applies if the value 0 ('unlimited') applies for the 'Max. results' parameter, as the default upper limit of 100 rows then applies, which means that the result does not output more than 100 'group' rows.


Examples

Simple use case: Single-level grouping

A CSV search for Addresses should provide a unique list of all values for the field 'Zip code' (zipcode) that are found in address book entries for which read access exists.

Configuration:

The CSV search is configured as shown on the right:

  • The list of Projections contains a Property projection for the 'Zip code' (zipcode) property as the only output column.

  • The Group by's list uses a Property projection for the same 'Zip code' (zipcode) property as the only grouping criterion.

The default values applicable to the Search builder have been retained for all other configuration features.


Runtime example:

zipcode
==============
""
EH42
1174
22045
94474
1972
94315
X64GB
""
12529
40210
...

NOTE◄ The value "" here stands for the empty string (""), which also represents the $null value in a CSV search. This is the only reason why the list can contain this value twice. A Tuple search would display the difference between "" and $null transparently.

images/download/attachments/169607365/image-2024-2-7_18-9-55-version-1-modificationdate-1707325795889-api-v2.png

IMPORTANT The CSV search will list a maximum of 100 zip codes in the given configuration, as this upper limit applies if the default value 0 is specified for the Max. results parameter. The effective upper limit is specified by a return value of the type "Search result" (CsvSearchResult or TupleSearchResult) in the maxResults property.

Typical use case: Group by's with aggregation

The previous example should now be extended to include an output column that shows how many address book entries each zipcode value was found in.

Configuration:

As shown in the screenshot on the right, the additional column in front of the existing Property projection was set up for the 'Zip code' (zipcode) property:

  • An Aggregate projection with the Type 'Count' (COUNT) is used.

  • The 'ID' (id) property is selected as the Projection. To determine a 'number', the only decisive factor for the property selection is whether it is filled. This is guaranteed for the id property for all entities.

images/download/attachments/169607365/image-2024-2-7_18-11-1-version-1-modificationdate-1707325861877-api-v2.png

Runtime example:

COUNT of id,zipcode
===================
11,
1,EH42
1,1174
2,22045
1,94474
2,3000
1,1972
2,94315
1,X64GB
5,
1,12529
2,40210
...

The example data (left) shows the results of the aggregation.

The first column with the automatically provided default label 'COUNT of id' indicates how many address book entries refer to the respective 'Zip code' (zipcode).

As long as no Sorting is defined, the order of the results is determined by the internal logic of the database when grouping.

If more different zip codes are found in address book entries than the search should return based on the setting for the 'Max. results' parameter, the query will return a more or less random subset under these conditions.

The following variant is therefore intended to demonstrate the use of Sorting in conjunction with Group by's.

Variant:

The existing CSV search is now adapted so that the 'TOP 10' of the most frequent zip codes in address book entries are displayed as the search result.

Sorting should (naturally) be done in descending order by frequency and – only in the event of parity – in ascending order by zip code.

Configuration:

The screenshot on the right shows the two-step Sorting that must be added to the CSV search:

  • The first sorting criterion specifies that sorting should be in descending order (the ASC (ascending) option is unchecked) according to the Aggregate that corresponds exactly to the first output column (see above).

  • The second sorting criterion specifies that sorting should be in ASC (ascending) order (option checked) according to the Property projection for the 'Zip code' (zipcode) property if several group rows provide the same value for the first sorting criterion.

images/download/attachments/169607365/image-2024-2-7_18-15-53-version-1-modificationdate-1707326153342-api-v2.png

The fact that only the 'TOP 10' is listed is regulated by the value 10 for the Max. results parameter in conjunction with the default offset value 0 for First result.

images/download/attachments/169607365/image-2024-2-7_18-12-31-version-1-modificationdate-1707325951699-api-v2.png

Runtime example:

COUNT of id,zipcode
===================
11,
5,
2,22045
2,3000
2,40210
2,80337
2,94315
1,1174
1,12529
1,13465

The sample data (left) shows the desired 'TOP 10'.

NOTE◄ The two 'most frequent' categories are occupied here by the values Empty string and $null, which does not necessarily correspond to the meaning of the 'TOP 10' rating.

With a Where condition such as the following, both 'incidents' could be ruled out:

images/download/attachments/169607365/image-2024-2-7_18-13-14-version-1-modificationdate-1707325994145-api-v2.png

Runtime example:

COUNT of id,zipcode
===================
2,22045
2,3000
2,40210
2,80337
2,94315
1,1174
1,12529
1,13465
1,1972
1,52036

The sample data (left) shows the desired 'TOP 10' including the condition (above) for 'Zip code filled'.

Empty strings and $null no longer occupy the top positions in the ranking and at the end two individual matches have moved up that were not previously part of the 'TOP 10'.

More complex use case: Multi-level group by's

Based on the above task, it is now necessary to ensure that address book entries with the same 'Zip code' (zipcode ) but different 'Country' (countryCode) are differentiated.

Until now, these have been unjustifiably grouped together by the Group by's.

Configuration:

An additional grouping criterion is added to the previous configuration under Group by's as shown on the right:

  • A Property projection defines the 'Country' (countryCode) property as an additional grouping criterion.

images/download/attachments/169607365/image-2024-2-7_18-13-57-version-1-modificationdate-1707326037485-api-v2.png

Runtime example:

COUNT of id,zipcode
===================
2,22045
2,3000
2,80337
2,94315
1,1174
1,12529
1,13465
1,1972
1,40210 █
1,40210 █

In the example data (left), the marking indicates the effect of the additional grouping criterion:

  • In contrast to the previous example, the 'Zip code' 40210 now appears in the last two lines with a frequency of 1 and no longer in list position 3 with a frequency of 2 (see list above).

  • The 'Zip code' 52036 was pushed from the last place on the list to 'off'.

NOTE◄ The additional grouping according to the 'Country' (countryCode) therefore also works without consideration in an output column.

Runtime example:

COUNT of id,zipcode,countryCode
===============================
2,22045,DE
2,3000,BE
2,80337,DE
2,94315,DE
1,1174,CH
1,12529,DE
1,13465,DE
1,1972,KH
1,40210,DE █
1,40210,FR █

The additional output column (countryCode DE/FR) makes it clear why the 'Zip code' (40210) now occupies two list positions.

However, the additional grouping criterion should also have a transparent and comprehensible effect. So we also add the Property projection for the 'Country' (countryCode) under the Projections as an output column.

images/download/attachments/169607365/image-2024-2-7_18-14-48-version-1-modificationdate-1707326088426-api-v2.png

More complex use case: Multi-level grouping with 'data preparation'

The CSV search from the previous example should now be redesigned so that only the first character of the 'Zip code' (zipcode) is taken into account in the Projections for Group by's, Sorting and output column.

The aim is to determine a 'TOP 10' of the zip code areas in the address book entries.

Configuration:

The screenshot on the right shows the Group by's section, where the first grouping criterion has been adjusted:

  • The Property projection for the 'Zip code' (zipcode) property has been cut and pasted into a newly created Sub string projection in the Projection parameter.

  • The first character of the 'Zip code' is 'cut out' by Begin index (1=Standard) and Max. length 1 'cut out'.

images/download/attachments/169607365/image-2024-2-7_18-17-40-version-1-modificationdate-1707326260096-api-v2.png


The screenshot on the right shows how the Sub string projection for the 'section' from the 'Zip code' can be implemented as a projection for an output column:

  • The Sub string projection must be used as a Projection in the context of an aggregation function, as the underlying Property projection for the zipcode property is no longer found in the list of grouping criteria after the above adjustment

    NOTE◄ The fact that the first grouping criterion is completely identical to the Sub string projection is not (recognized) here.

  • Instead of Type 'Min', 'Max' could be used for the Aggregate. Due to the grouping in each group line, only one uniform value from the Sub string projection is expected.

images/download/attachments/169607365/image-2024-2-7_18-18-33-version-1-modificationdate-1707326313107-api-v2.png

The screenshot (right) shows the required customization for Sorting:

  • The first sorting criterion can be retained unchanged, as the 'counting' of the 'ID' values does not change as a result of the change from 'Zip code' to 'Zip code area'.

  • As a second sorting criterion, however, the Sub string projection from the Group by's must be used as a Projection within an Aggregate with Type 'Min' or 'Max' instead of the original Property projection for the zipcode property.

    NOTE◄ Here too, as in the context of the output column, there is actually nothing to aggregate, so the aggregation function is only needed to fulfill formal requirements (see introduction).

images/download/attachments/169607365/image-2024-2-7_18-19-31-version-1-modificationdate-1707326371343-api-v2.png

Runtime example:

COUNT of id,MIN of substring of zipcode,countryCode
===================================================
4,8,DE
2,1,DE
2,3,BE
2,9,DE
1,1,CH
1,1,KH
1,2,DE
1,4,DE
1,4,FR
1,5,US

NOTE◄ Instead of the automatically generated column titles, more meaningful assignments for the Name property of the output columns are recommended