SqlBulkUnit
Class name |
com.ebd.hub.datawizard.iu.SqlBulkUnit |
This Integration Unit is intended for the bulk insert of data into a database table. In comparison to the insert in phase 4, a significantly higher speed is possible by using the SqlBulkUnit. Important note: The SQL execution during phase 4 must be switched off to use this Integration Unit. This is done by the Integration Unit! In addition, the node attribute "Only for SQL" must not be used.
The Integration Unit uses the following SQL-relevant attributes in target structure nodes.
DB Alias. Note: Only fixed values and system constants are evaluated but no variables.
SQL table/procedure. Important note: Please see description of Integration Unit parameter Allow dynamic table names.
Truncate table.
SQL mode. In addition to insert, only update and only delete are possible if at least one field has the attribute SQL key set and the Integration Unit parameter Use prepared statements is set to true.
Note: Node attributes Opt. Pre-SQL execution and Transaction control are not used.
The Integration Unit uses the following SQL-relevant attributes in target structure fields.
SQL column name.
Note: Field attribute SQL AutoGenKey is not used and SQL key is ignored if node attribute SQL mode is set to insert.
Note: Skip empty SQL value is not used.
Parameters
Parameter name |
Allowed values |
Default value |
Description |
Allow dynamic table names |
true, false |
false |
If true, variables and constants are allowed to set table names (in attribute SQL Table/Procedure in target structure nodes). If false, only fixed values and system constants are allowed. Important note: Variables are resolved only once! If an SQL node is traversed several times, the table name is specified by a variable and the value of the variable changes, the resolved variable value of the first iteration is used as table name for all iterations! |
CSV delimiter for direct call |
|
; |
The CSV field delimiter for direct calls (using the backup file). Transcription of non-printable delimiters is supported. |
Commit at end and not after each batch |
true, false |
false |
If true, data is only committed after the last record. This can lead to a database error if the tablespace for transactions is too small. |
Encoding to use for BLOB fields |
|
8859_1 |
The encoding that is used for BLOB fields. |
Escape character to mask ' if no prepared statements are used |
|
' |
Only considered if no prepared statements are used. The SQL special character apostrophe (') is not allowed to be present within a field value. If this cannot be ruled out, a database-dependent escape character has to be specified here. Since the usage of an escape character leads to speed losses, leave this parameter blank, if you can guarantee the non-occurrence of an apostrophe. |
Max. sql statements in batch |
|
500 |
If the SQL statement has reached the specified size, a commit is executed. This option is only effective if Use autocommit on and Commit at end and not after each batch are set to false. |
Node name for simple bulk operation |
|
|
Node name for bulk operation (for alternative 'direct call', see below). |
Replace empty string with NULL |
true, false |
false |
If true, empty CSV fields will be replaced with NULL when inserted. |
Target system is Salesforce |
true, false |
false |
true if the target system is S alesforce. |
Use autocommit on |
true, false |
false |
Decides, whether each record is immediately committed. Depending on the database that may be an advantage or not, because on one side, no transaction is created, and on the other side, the data is written immediately. |
Use prepared statements |
true, false |
true |
Depending on the database, it may be an advantage to work with prepared statements. |
Use truncate instead of delete cmd |
true, false |
false |
true if truncate is to be used instead of delete. |
Note: See also section Representation of Nonprintable Characters.
Practical application
The best way to provide a large amount of data to be inserted is by using the DefaultFileSQLCron (Time-driven Input Agent Custom class).
There are 2 alternatives:
Alternative 1 - with mapping: Make sure to use the preparser TokenStreamSplitter . Depending on the number of columns of your table and the available memory, you should be able to find a setup that suits your system.
Alternative 2 - without mapping (direct call): Make sure to use the preparser DummyPreParser .
Examples
Alternative 1 - with mapping
If data has to be manipulated, you have to use mappings and functions. If you use this Integration Unit, be aware that you cannot use all the node and field attributes.
Field attribute Template will not be used (e.g. date templates will not convert into SQL dates).
Field attribute SQL AutoGenKey will not be used.
Node attribute Commitment will not be used. Instead, the settings of the Integration Unit will be used.
Important note: Make sure that Phase 2 → Extended settings → Swap record data to file' is set to 0%!
Alternative 2 - without mapping (direct call)
On the condition that the source table and the target table have an identical structure (same field order, same types, and same field sizes), a mapping is dispensable. This mode of the SqlBulkUnit is called 'direct call'. Mappings (i.e. also function executions) will not be executed in the direct call mode. The backup file is passed directly to the Integration Unit, as it was generated by the agent DefaultSQLCron or DefaultFileSQLCron (the fields will be separated by semicolons, which is the default setting of the parameter CSV delimiter for direct call).
To use this alternative, you have to use the preparser DummyPreParser. The source structure only needs to have one node and one field to receive the string dummy data from the preparser.
Suppose the DefaultFileSQLCron executed the SQL statement SELECT 'A', col1, col2 FROM xyz. The Integration Unit is looking for a target node with the name A. This node has to be active and the attributes DB Alias and SQL table/procedure have to be set.
This alternative will not create or swap records. The processing time should grow linearly with the number of datasets and the memory usage is constantly low, independent of the number of datasets. The CPU load will mostly increase on the database server side.
Possible errors
Even though this Integration Unit is designed for mass data, errors might occur when dealing with large amounts of data. As an example, see the following error message when using a Microsoft SQL Server.
...
... IntegrationUnit reported exception, aborting request com.ebd.hub.iu.UnitTransactionException: null
Nested exception-message is: There is insufficient system memory in resource pool 'internal' to run this query.
...
Nested exception is: com.microsoft.sqlserver.jdbc.SQLServerException: There is insufficient system memory in resource pool 'internal' to run this query.
...
The error does not come directly from the Integration Unit, but from the database itself. In this case, you could give the Microsoft SQL Server more memory (see the manufacturer's documentation) or reduce the value of the I ntegration Unit parameter Max. Sql statements in batch (see above).