Group by's
Search types with output columns (CSV search, Tuple search) optionally support the definition of a list of Projections for grouping criteria.
Configured grouping criteria are usually mapped directly to the GROUP BY section in generated SELECT statements.
Group by's aim to subsequently summarize the primary match list of a search in such a way that only a single line is output in the final result for each unique combination of the Projections listed as grouping criteria. Aggregation functions (see Aggregate) can be used to summarize the data of the grouped data records.
The primary match list takes into account any Restrictions defined as a 'condition' and any Joins that may have a restrictive or multiplying effect on the primary match list.
Projections defined as grouping criteria do not have to be configured as output columns.
Conversely, the following strict rule applies to the definition of Projections for output columns or for Sorting:
A projection for an output column or a sorting criterion must be used as a 'projection' within an aggregation function (see Aggregate) if its configuration includes a property that is not also defined as a grouping criterion by Property projection.
►NOTE◄ The last example (see below) should illustrate what this restriction means for the practical handling of Group by's.
►IMPORTANT◄ The 'Calculate total number' option must not be checked if more than one grouping criterion is configured under Group by's.
Configuration
The data model of the relevant Search types (CSV search, Tuple search) offers a list field Group by's (groupBy), which expects Projections as list values:
By default (see screenshot above), the Group by's list is empty.
A grouping criterion (projection configuration) can be added to the list by clicking on the
icon
An existing grouping criterion can be removed by clicking on the garbage can symbol.
If the context provides a tree structure for displaying the search, the order position of grouping criteria can be adjusted using drag & drop.
CAUTION
The order of the grouping criteria generally has no influence on the results of the grouping and any existing aggregation functions. However, the order of the results can depend on the order of the grouping criteria and this can lead to differences in the content of the result in conjunction with a limitation of the output rows via the 'Max. results' parameter. This also applies if the value 0 ('unlimited') applies for the 'Max. results' parameter, as the default upper limit of 100 rows then applies, which means that the result does not output more than 100 'group' rows.
Examples
Simple use case: Single-level grouping
A CSV search for Addresses should provide a unique list of all values for the field 'Zip code' (zipcode) that are found in address book entries for which read access exists.
Configuration:
The CSV search is configured as shown on the right:
The default values applicable to the Search builder have been retained for all other configuration features. Runtime example: zipcode ►NOTE◄ The value "" here stands for the empty string (""), which also represents the $null value in a CSV search. This is the only reason why the list can contain this value twice. A Tuple search would display the difference between "" and $null transparently. |
|
►IMPORTANT◄ The CSV search will list a maximum of 100 zip codes in the given configuration, as this upper limit applies if the default value 0 is specified for the Max. results parameter. The effective upper limit is specified by a return value of the type "Search result" (CsvSearchResult or TupleSearchResult) in the maxResults property.
Typical use case: Group by's with aggregation
The previous example should now be extended to include an output column that shows how many address book entries each zipcode value was found in.
Configuration:
As shown in the screenshot on the right, the additional column in front of the existing Property projection was set up for the 'Zip code' (zipcode) property:
|
|
Runtime example: COUNT of id,zipcode |
The example data (left) shows the results of the aggregation. The first column with the automatically provided default label 'COUNT of id' indicates how many address book entries refer to the respective 'Zip code' (zipcode). As long as no Sorting is defined, the order of the results is determined by the internal logic of the database when grouping. If more different zip codes are found in address book entries than the search should return based on the setting for the 'Max. results' parameter, the query will return a more or less random subset under these conditions. The following variant is therefore intended to demonstrate the use of Sorting in conjunction with Group by's. |
Variant:
The existing CSV search is now adapted so that the 'TOP 10' of the most frequent zip codes in address book entries are displayed as the search result.
Sorting should (naturally) be done in descending order by frequency and – only in the event of parity – in ascending order by zip code.
Configuration:
The screenshot on the right shows the two-step Sorting that must be added to the CSV search:
|
|
The fact that only the 'TOP 10' is listed is regulated by the value 10 for the Max. results parameter in conjunction with the default offset value 0 for First result. |
|
Runtime example: COUNT of id,zipcode |
The sample data (left) shows the desired 'TOP 10'. ►NOTE◄ The two 'most frequent' categories are occupied here by the values Empty string and $null, which does not necessarily correspond to the meaning of the 'TOP 10' rating. With a Where condition such as the following, both 'incidents' could be ruled out: |
Runtime example: COUNT of id,zipcode |
The sample data (left) shows the desired 'TOP 10' including the condition (above) for 'Zip code filled'. Empty strings and $null no longer occupy the top positions in the ranking and at the end two individual matches have moved up that were not previously part of the 'TOP 10'. |
More complex use case: Multi-level group by's
Based on the above task, it is now necessary to ensure that address book entries with the same 'Zip code' (zipcode ) but different 'Country' (countryCode) are differentiated.
Until now, these have been unjustifiably grouped together by the Group by's.
Configuration:
An additional grouping criterion is added to the previous configuration under Group by's as shown on the right:
|
|
Runtime example: COUNT of id,zipcode |
In the example data (left), the marking indicates the effect of the additional grouping criterion:
►NOTE◄ The additional grouping according to the 'Country' (countryCode) therefore also works without consideration in an output column. |
Runtime example: COUNT of id,zipcode,countryCode The additional output column (countryCode DE/FR) makes it clear why the 'Zip code' (40210) now occupies two list positions. |
However, the additional grouping criterion should also have a transparent and comprehensible effect. So we also add the Property projection for the 'Country' (countryCode) under the Projections as an output column.
|
More complex use case: Multi-level grouping with 'data preparation'
The CSV search from the previous example should now be redesigned so that only the first character of the 'Zip code' (zipcode) is taken into account in the Projections for Group by's, Sorting and output column.
The aim is to determine a 'TOP 10' of the zip code areas in the address book entries.
Configuration:
The screenshot on the right shows the Group by's section, where the first grouping criterion has been adjusted:
|
|
The screenshot on the right shows how the Sub string projection for the 'section' from the 'Zip code' can be implemented as a projection for an output column:
|
|
The screenshot (right) shows the required customization for Sorting:
|
|
Runtime example:
COUNT of id,MIN of substring of zipcode,countryCode
===================================================
4,8,DE
2,1,DE
2,3,BE
2,9,DE
1,1,CH
1,1,KH
1,2,DE
1,4,DE
1,4,FR
1,5,US
►NOTE◄ Instead of the automatically generated column titles, more meaningful assignments for the Name property of the output columns are recommended