Data Transformation provides operations that modify, convert, extract, and structure data while it is in-flight.
Overview
This documentation describes available data transformation operations, their parameters, examples, supported use cases, FAQs, and notes.
Each operation is intended to work on incoming pipeline data without changing the original technical meaning of the documented behavior.
When to Use
Use these operations when you need to change case, convert data types, append fields, extract data, convert between formats, generate derived values, or prepare data for downstream systems.
How It Works
Each transformation operation accepts a defined number of parameters. Based on the operation and the values supplied, the pipeline modifies the specified keys or generates new output keys while processing data in-flight.
How to Configure / How to Use
Select the required operation, provide the documented parameters exactly as expected, and use the examples as reference for the parameter structure.
Upper Case
The Upper Case operation converts a given key value into Upper Case from any JSON Dataset while data is in-flight.
Description
This operations helps to convert a given key value into Upper Case from any JSON Dataset. This operations take place while data is in-flight.
Number of Parameters : 1
The Upper Case operation requires 1 parameter.
Parameter : Uppercase
Provide comma separated list of keys in double quotes to convert the defined key values into Upper Case.
Below is a example where we are converting the value of first_name and last_name into Upper Case.
"first_name", "last_name"
Lower Case
The Lower Case operation converts a given key value into Lower Case from any JSON Dataset while data is in-flight.
Description
This operations helps to convert a given key value into Lower Case from any JSON Dataset. This operations take place while data is in-flight.
Number of Parameters : 1
The Lower Case operation requires 1 parameter.
Parameter : Lowercase
Provide comma separated list of keys in double quotes to convert the defined key values into Lower Case.
Below is a example where we are converting the value of first_name and last_name into Lower Case.
"first_name", "last_name"
Data Type
The Data Type operation converts any key’s value into its target data type such as Boolean, Float, Integer, or Date Time.
Description
Data type operation can convert any key’s value into it’s data type like
- String type Boolean into Boolean data type
- String type Float into Float data type
- String type Integer into Integer data type
- String type Datetime into Datetime data type
Number of Parameters : 4
The Data Type operation requires 4 parameters.
Parameter : Boolean
A string type Boolean can be converted into Boolean data type.
String type Boolean examples: “True”,”False”,”0″,”1″ etc.
Booleans datatype do not contain double quotes.
Below is a example where we are converting the values of test_passed key into Boolean data type.
"test_passed"
Parameter : Float
A string type Float can be converted into Float data type.
Below is a example where we are converting the value of Amount into JSON data type float.
"Amount"
Parameter : Integer
A string type Integer can be converted into Integer data type.
Below is a example where we are converting the value of Quantity into JSON data type Integer.
"Quantity"
Parameter : Date Time
The Date Time parameter is used to convert date or datetime values from one format to another.
It supports both formatted date strings and Epoch timestamps.
From date format : The date/datetime format in which the user is defining their date/datetime.
To date format : The date/datetime format in which the user wants to convert their date/datetime.
Example: [“key_name”,”From date format”,”To date format”]
Below is a example where we are converting From date format to To date format inside JSON data type.
"startweekdate1","%Y-%m-%d %H:%M:%S.%f%z","%Y-%m-%d %H:%M:%S","startweekdate2","%Y-%m-%d %H:%M:%S","%Y-%m-%d"
For Goldfinch Datalake always send date datatype format as
%Y-%m-%dT%H:%M:%S.%f%z
Supported Date Format Codes
| Code | Example | Description |
|---|---|---|
| %a | Sun | Weekday as locale’s abbreviated name. |
| %A | Sunday | Weekday as locale’s full name. |
| %w | 0 | Weekday as a decimal number, where 0 is Sunday and 6 is Saturday. |
| %d | 08 | Day of the month as a zero-padded decimal number. |
| %-d | 8 | Day of the month as a decimal number. (Platform specific) |
| %b | Sep | Month as locale’s abbreviated name. |
| %B | September | Month as locale’s full name. |
| %m | 09 | Month as a zero-padded decimal number. |
| %-m | 9 | Month as a decimal number. (Platform specific) |
| %y | 13 | Year without century as a zero-padded decimal number. |
| %Y | 2013 | Year with century as a decimal number. |
| %H | 07 | Hour (24-hour clock) as a zero-padded decimal number. |
| %-H | 7 | Hour (24-hour clock) as a decimal number. (Platform specific) |
| %I | 07 | Hour (12-hour clock) as a zero-padded decimal number. |
| %-I | 7 | Hour (12-hour clock) as a decimal number. (Platform specific) |
| %p | AM | Locale’s equivalent of either AM or PM. |
| %M | 06 | Minute as a zero-padded decimal number. |
| %-M | 6 | Minute as a decimal number. (Platform specific) |
| %S | 05 | Second as a zero-padded decimal number. |
| %-S | 5 | Second as a decimal number. (Platform specific) |
| %f | 000000 | Microsecond as a decimal number, zero-padded to 6 digits. |
| %z | +0000 | UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive). |
| %Z | UTC | Time zone name (empty string if the object is naive). |
| %j | 251 | Day of the year as a zero-padded decimal number. |
| %-j | 251 | Day of the year as a decimal number. (Platform specific) |
| %U | 36 | Week number of the year (Sunday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. |
| %-U | 36 | Week number of the year (Sunday as the first day of the week) as a decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. (Platform specific) |
| %W | 35 | Week number of the year (Monday as the first day of the week) as a zero-padded decimal number. All days in a new year preceding the first Monday are considered to be in week 0. |
| %-W | 35 | Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0. (Platform specific) |
| %c | Sun Sep 8 07:06:05 2013 | Locale’s appropriate date and time representation. |
| %x | 09/08/13 | Locale’s appropriate date representation. |
| %X | 07:06:05 | Locale’s appropriate time representation. |
| %% | % | A literal ‘%’ character. |
Supported Date Input Types
The Date Time parameter supports:
- Formatted date strings (using strftime / strptime formats)
- Epoch timestamps in Seconds
- Epoch timestamps in Milliseconds
- Epoch timestamps in Microseconds
- Epoch timestamps in Nanoseconds
What is Epoch?
Epoch is a numeric representation of date and time.
Instead of storing a date as a formatted string, epoch stores it as a number that represents the amount of time elapsed since a fixed starting point.
Epoch Start Time
The standard Unix epoch starts at:
January 1, 1970, 00:00:00 (UTC)
This moment is considered epoch value = 0.
How Epoch Represents Time
A date and time is converted into a single number based on how much time has passed since the epoch start.
Example
Formatted date:
2024-01-01 00:00:00 UTC
Epoch (seconds):
1704067200
Epoch (milliseconds):
1704067200000
Both values represent the same moment in time, just in different units.
Epoch Format Support
Epoch values can be used directly in the date format fields using reserved keywords.
Supported Epoch Keywords
| Keyword | Meaning |
|---|---|
| epoch | Unix epoch in seconds |
| epoch_ms | Unix epoch in milliseconds |
| epoch_us | Unix epoch in microseconds |
| epoch_ns | Unix epoch in nanoseconds |
These keywords can be used as source formats or target formats, without changing the parameter structure.
Date Time Conversion Examples
Formatted Date → Formatted Date
"created_date","%Y-%m-%d %H:%M:%S","%d-%m-%Y %H:%M:%S"
Epoch → Formatted Date
Epoch (seconds) to date string:
"created_date","epoch","%Y-%m-%d %H:%M:%S"
Epoch (milliseconds) to date string:
"created_date","epoch_ms","%Y-%m-%d %H:%M:%S"
Formatted Date → Epoch
Date string to epoch seconds:
"created_date","%Y-%m-%d %H:%M:%S","epoch"
Date string to epoch milliseconds:
"created_date","%Y-%m-%d %H:%M:%S","epoch_ms"
Epoch → Epoch (Unit Conversion)
Convert epoch milliseconds to seconds:
"created_date","epoch_ms","epoch"
Multiple Date Fields Example
"startweekdate1","%Y-%m-%d %H:%M:%S.%f%z","%Y-%m-%d %H:%M:%S", "startweekdate2","epoch_ms","%Y-%m-%d"
Append
Append operation adds a new key on the fly with its value as dynamic value or static value for each record.
Description
Append operation adds a new key on the fly with it’s value as dynamic value or static value. This will add key and it’s value for each record.
Number of Parameters : 1
The Append operation requires 1 parameter.
Parameter : Append
By adding new elements to the end of an existing data structure, the append operation can help to extend or modify the data structure in a flexible and efficient way.
For dynamic value the user can use {%column_name%} where column_name is the incoming column in the data pipeline.
For dynamic Integer value use {%^column_name^%} where column_name is the incoming column in the data pipeline.
Below is an example where we are adding a new key in-flight with its fixed/static value.
"export_flag_y":"Y","export_flag_p":"P"
Below is an example where we are adding a new key whose value is Concatenation of two keys value.
"concatenate_key_name":"{%ORDERNUMBER%}{%ORDER_TYPE%}"
Below is an example where we are adding a new key whose value is Concatenation of two keys value separated by a String Pipe (|).
"concatenate_key_name":"{%ORDERNUMBER%}|{%ORDER_TYPE%}"
Below is an example where we are adding an array with key’s value with dynamic values to the keys.
"array_key": [{"transportmode":"{%MODE_OF_TRANSPORT%}","orderType": "{%ORDER_TYPE%}"}]
Below is an example where we are adding an array with key’s value with dynamic values to the keys. We can append multiple keys at a time.
"orderLines1":"{%orderLines%}","bill_to":[{"city": "ALTADENA","contactName": "None"}],"array_key":[{"transportmode": "{%MODE_OF_TRANSPORT%}","orderType": "{%ORDER_TYPE%}"}]
Below is an example where we are adding an object with key’s value with dynamic values to the keys. We can append multiple keys at a time.
"keyname": {
"id": "{%id%}",
"email": "{%email%}",
"first_name": "{%first_name%}"
}
Below is an example for a key whose value is a dynamic Integer value and boolean value.
In below example the price key was integer and status_flag key was boolean before using Append operation and after using Append operation the data type remains same with the below sprintf feature.
"price":"{%^stock_price^%}",
"status_flag":"{%^status_flag^%}",
Below is an example of creating a sentence using the Append operation.
"neural_field":"The system facilitates comprehensive tracking of product changes, capturing details from the initial problem identification (Product ID:{%^PRODUCTID^%}, Change Request ID:{%^QCR_ID^%},Change Request Date:{%^CHANGE_REQUEST_DATE^%}& Time:{%^CHANGE_REQUEST_TIME^%}, Issue Description:{%^ISSUE_DESCRIPTION^%}) through to the final verification (Validation Status:{%^VALIDATION_STATUS^%}, Date:{%^VALIDATION_DATE^%} & Time:{%^VALIDATION_TIME^%})."
Note: For creating sentence use this sprintf {%^ISSUE_DESCRIPTION^%} only.Rest of the sprintf syntax and functionality is same as before.
Simplify your Append Operation with Auto Mapping
Struggling to manually create the Append JSON for field mappings? Use the built-in Auto Mapping feature to automatically generate accurate mappings by providing source and target schemas.
Learn more about Auto Mapping : https://help.bizdata360.com/books/ezintegrations/page/auto-mapping-append-operation
Title Case
Title Case operation converts a given key’s value into title case.
Description
Title Case operation helps in the converting a given key’s value into title case.
Number of Parameters : 1
The Title Case operation requires 1 parameter.
Parameter : Title Case
Provide comma separated list of keys in double quotes to convert the defined key values into Title Case.
Below is an example where we are converting the values of amount and first_name into Title Case.
"amount","first_name"
Data Extractor
Data Extractor operation is designed to extract specific data from JSON response.
Description
Data Extractor operation is designed to extract specific data from JSON response.
Number of Parameters : 2
The Data Extractor operation requires 2 parameters.
Parameter : Data Extractor
Data Extractor is used to extract keys and its value from a JSON response.
Parameter :Data Extractor Keys
This helps to provide user defined keys. If left blank it will auto generate keys.
Below is an example where we are extracting the values of access_token and feedDocumentId.
"['access_token']","['feedDocumentId']"
Trim
Trim operation removes unnecessary parts from the given key’s value as defined by the user.
Description
Trim operation helps in removing unnecessary parts from the given key’s value as defined by the user.
Number of Parameters : 1
The Trim operation requires 1 parameter.
Parameter : Trim Key
Provide comma separated list of keys in double quotes to trim the defined key values.
Below is an example where we will trim the defined key value first_name.
"first_name"
JSON to String
JSON to String operation converts JSON into a String.
Description
JSON to String operation is used to convert JSON into a String.
Number of Parameters : 1
The JSON to String operation requires 1 parameter.
Parameter : JSON to String
Provide comma separated list of keys in double quotes to convert the defined key’s value from JSON to String.
Below is an example where we are converting the values of key1 and key2 from JSON to String.
"key1","key2"
String to JSON
String to JSON operation converts a String into a JSON.
Description
String to JSON operation is used to convert a String into a JSON.
Number of Parameters : 1
The String to JSON operation requires 1 parameter.
Parameter : String to JSON
Provide comma separated list of keys in double quotes to convert the defined key’s value from String to JSON.
Below is an example where we are converting the value of key1 and key2 from JSON to String.
"key1","key2"
JSON to XML
JSON to XML operation converts JSON object or value into XML.
Description
JSON to XML operation helps to convert JSON object or value into XML.
Number of Parameters : 2
The JSON to XML operation requires 2 parameters.
Parameter : Key Data
Key Data converts the provided key’s value from JSON to XML.
Below is an example where we are converting the value of product_data_response from JSON to XML.
product_data_response
Parameter : Response Key
Response Key holds the converted XML value under the specified key name which is easy to access.
Below is an example of the key name data_response which will hold the converted XML value.
data_response
XML to JSON
XML to JSON converts XML into JSON.
Description
XML to JSON is used to convert XML into JSON.
Number of Parameters : 2
The XML to JSON operation requires 2 parameters.
Parameter : Get key
Get Key converts the provided Key’s value from XML to JSON.
Below is an example where we are converting the value of product_data_response from XML to JSON.
product_data_response
Parameter : Response key
Response Key take note of the converted value under the specified key name which is easy to access.
Below is an example of the key name data_response which will hold the converted JSON value.
data_response
Base64 Encoding
Base64 Encoding converts specified key values into Base64 encoded values.
Description
Base64 operation is used to convert some specific key to base64 encoded, the user can give multiple keys if they require.
Number of Parameters : 1
The Base64 Encoding operation requires 1 parameter.
Parameter : Base64 Encode
Used to encode the values of given key into base64.
Below is an example where we are encoding the value of email into Base64.
"email"
Base64 Decoding
Base64 Decoding converts a Base64-encoded string back to its original data format.
Description
Base64 decoding operation is used to convert a Base64-encoded string back to its original data format, the user can give multiple keys if they require.
Number of Parameters : 1
The Base64 Decoding operation requires 1 parameter.
Parameter : Base64 Decode
Below is an example where we are decoding the Base64 encoded value of email back to original data.
"email"
Generate Array Sequence Number
Generate Array Sequence Number operation helps in generating sequence number for each row.
Description
Generate Array Sequence Number operation helps in generating sequence number for each row.
Number of Parameters : 2
The Generate Array Sequence Number operation requires 2 parameters.
Parameter : Sequence Key
It is the key name in which the single line data is present whose sequence the user needs to give.
Below is an example of the key_name which holds the single line data.
key_name
Parameter : Sequence Number Key
It is the name of the new key for sequence.
Below is an example of the key name DATA which will store the sequence number.
DATA
Send Keys top of Root
Send Keys top of root operation helps in bringing the given nested key’s value to the top of root.
Description
Send Keys top of root operation helps in bringing the given nested key’s value to the top of root.
Number of Parameters : 1
The Send Keys top of Root operation requires 1 parameter.
Parameter : Column to Root
Provide comma separated key name in double quotes to specify the key’s value.
Below is an example where we are giving the nested key as key_name whose value we want to bring at the top of the root.
"key_name"
Today Timestamp
Today Timestamp operation adds a new key on the fly with the value of today’s date or date time as specified by the user.
Description
Today Timestamp operation adds a new on the fly with the value of today’s date/ date time as specified by the user.
Number of Parameters : 2
The Today Timestamp operation requires 2 parameters.
Parameter : Date Format
Date Format is used to pass the required format.
Below is an example of a datetime format which can be modified according to user’s need.
%Y-%m-%dT%H:%M:%S.%f%z
Parameter : Datetime Key
Datetime Key is the name of key in which the user wants to save their date/ datetime.
Below is an example of key name dl_insert_date which will add a new key on the fly with date format value.
dl_insert_date
Round
Round operation reduces a decimal number to a specific number of decimal places.
Description
Round operation is used to reduce a decimal number to a specific number of decimal places, where the numbers need to be rounded off.
Number of Parameters : 2
The Round operation requires 2 parameters.
Parameter : Round Keys
Round Keys is used to access the specific key that needs to be rounded off, define by the user.
Below is an example of the key name’s float_value and int_value whose value we want to round off.
"float_value","int_value"
Parameter : Decimal Key Number
Decimal Key Number specifies how much decimal places the user needs.
Below is a example of the decimal key number till where we are rounding off the value.
2
Calculator
Calculator operation is used to calculate a process provided by the user based on the values in the columns.
Description
Calculator operation is used when we want to calculate any particular process provided by the user depending upon the values which are in the columns.
Number of Parameters : 2
The Calculator operation requires 2 parameters.
Parameter : Calculation Keys
Calculation Keys holds the calculated formula based on the column names of the provided data.
Below is a example where we are providing calculated formula’s Amount1 – Amount2 and Amount1 + Amount2 based on column names of data.
"Amount1-Amount2","Amount1 + Amount2"
Parameter : New Calculation Keys
New Calculation Keys used to store the calculated values.
Provide comma separated list of keys in double quotes to specify the value.
Below is an example of new keys name key1 and key2 which will store the calculated values.
"key1","key2"
Date Analytics
Date analytics helps extract related information such as financial year, financial month, and quarterly details from dates.
Description
Date analytics helps us to extract information about the date like the financial year, financial month, quarterly information of year as well as financial year and many other relatable information about the date.
Number of Parameters : 4
The Date Analytics operation requires 4 parameters.
Parameter : Data Field Key
In Data Field Key we provide the key name which holds the date.
Below is the example of the key name Created_datetime which holds the required value.
Created_datetime
Parameter : Fiscal Month Start
Fiscal Month Start is used to specify the fiscal start month of an organization.
Below is an example of the organization’s fiscal start month as April so we are taking the month number of the year i.e. 4.
4
Parameter : Date Column
Date Column is used for giving user defined 11 fields for savings data. If left blank by user it will generate 11 fields dynamically.
Below is an example of user defined 11 fields.
"Calender Month","Calender Month Num","Calender Year","Calender FY Month Num","Calender FY","Calender Week Num","Calender Month Week","Calender Qr Num","Calender FY Qr Num","Calender Qr","Calender FY Qr"
Below is an example when user leaves date_column blank.
Parameter : Weekday
Weekday is used to specify the starting of the week, %W (starting of week from Monday) and %U (starting of week from Sunday).
Below is an example of weekday starting with Monday.
%W
Repeat First Row Values
Repeat First Row Values operation is used to repeat the first row’s value of specified columns.
Description
Repeat First Row Values operation is used to repeat the 1st row’s value of the specified column’s.
Number of Parameters : 2
The Repeat First Row Values operation requires 2 parameters.
Parameter : Source Key
Source Key is the key which holds the user’s single line data that we need to pass.
Below is the example of key name that holds the data.
['product_data_response']['data']
Parameter : Fields to Repeat Value
Fields To Repeat Value is a list of key names comma separated in double quotes whose first row value we want to repeat.
Below is an example of key names whose 1st row value we want to repeat.
"month","customer_site"
Grok Pattern
Grok operation parses log files and extracts structured data from unstructured log lines using predefined patterns.
Description
Grok operation is used for parsing log files and extracting structured data from unstructured log lines. It employs predefined patterns to efficiently identify and capture specific types of information.
How It Works
Users provide an input key and a Grok pattern. The operation uses the supplied pattern to extract structured values from the source text.
Commonly Used Grok Patterns
- WORD: Matches a single word (sequence of letters).
- NUMBER: Matches any integer or floating-point number.
- INT: Matches an integer.
- BASE10NUM: Matches a base-10 number.
- POSINT: Matches a positive integer.
- NONNEGINT: Matches a non-negative integer.
- NEGINT: Matches a negative integer.
- UUID: Matches a Universally Unique Identifier (UUID).
- IP: Matches an IP address (IPv4 or IPv6).
- EMAILADDRESS: Matches an email address.
- HOSTNAME: Matches a hostname.
- URIPROTO: Matches the protocol part of a URI (e.g., http, ftp).
- URIPATH: Matches the path part of a URI.
- URI: Matches a complete URI.
- USERNAME: Matches a username.
- DATA: Matches any character sequence.
- GREEDYDATA: Matches any character sequence but consumes as much as possible.
- TIMESTAMP_ISO8601: Matches a timestamp in ISO 8601 format (e.g., “2023-09-13T12:34:56.789Z”).
- HTTPD_COMMONLOG: Matches the common log format used in web server logs.
- HTTPD_COMBINEDLOG: Matches the combined log format used in web server logs.
- SYSLOGTIMESTAMP: Matches a timestamp in syslog format.
- SYSLOGHOST: Matches the hostname in syslog format.
- SYSLOGPROG: Matches the program name in syslog format.
- SYSLOGMESSAGE: Matches the syslog message.
- QUOTEDSTRING: Matches a string enclosed in double or single quotes.
- PATH: Matches a file system path.
- URL: Matches a URL.
- USERAGENT: Matches a user-agent string from a web log.
- WORDNUM: Matches a word followed by a number.
- UUID4: Matches a UUID version 4.
- MAC: Matches a MAC address.
- POSREAL: Matches a positive real number.
These patterns enable the Grok operation to efficiently process log data and extract relevant information, facilitating better analysis and understanding of system logs. Users can customize their log parsing by leveraging these patterns to suit the specific needs of their applications.
Number of Parameters : 2
The Grok Pattern operation requires 2 parameters.
Parameter : Input Key
In the Input Key parameter, users are required to specify the key from which they intend to extract the data. This key serves as the reference point for the Grok operation to identify and capture the relevant information based on the predefined patterns.
For instance, when utilizing the Input Key parameter, consider a scenario where the specified key is ‘Details.’
Input Key :
Details
Within the ‘Details’ key, the data encapsulates an endpoint URL, a MAC address (00:1A:2B:3C:4D:5E), and both IPv4 (192.168.1.1) and IPv6 (2001:0db8:85a3:0000:0000:8a2e:0370:7334) addresses.
Details : This is endpoint url https://www.example.com/path/to/resource for mac add 00:1A:2B:3C:4D:5E and v4 192.168.1.1 and V6 2001:0db8:85a3:0000:0000:8a2e:0370:7334
Parameter : Grok Pattern
In the Grok Pattern parameter, users can specify a predefined pattern to guide the extraction of data. This pattern serves as a template, enabling the Grok operation to accurately identify and capture relevant information from the input data according to the defined structure.
For instance, when utilizing the Grok Pattern parameter, let’s consider a scenario where we input the pattern ‘grok_pattern.’ This specified pattern guides the Grok operation in parsing and extracting data from the input based on the provided template.
Grok Pattern :
This is endpoint url %{URI:endpoint_url} for mac add %{MAC:mac_address} and v4 %{IPV4:ip_address_v4} and V6 %{IPV6:ip_address_v6}
Result
Details: This is endpoint url https://www.example.com/path/to/resource for mac add 00:1A:2B:3C:4D:5E and v4 192.168.1.1 and V6 2001:0db8:85a3:0000:0000:8a2e:0370:7334 endpoint_url: https://www.example.com/path/to/resource mac_address: 00:1A:2B:3C:4D:5E ip_address_v4: 192.168.1.1 ip_address_v6: 2001:0db8:85a3:0000:0000:8a2e:0370:7334
PDF Extractor
This operation helps to extract data from PDF files while data is in-flight.
Description
This operation helps to extract data from PDF files. It takes place while data is in-flight.
Number of Parameters : 2
The PDF Extractor operation requires 2 parameters.
Parameter : File URL Key
Enter the key name which contains the PDF File URL. In this case the Base64 key will be empty.
Example:
Items
Parameter : Base64 Key
Write the key name which will have the Base64 encoded data. In this case the File URL Key will be empty.
Example:
@xyz.grapgh.downloadUrl
ARRAY COUNT
This operation is employed to retrieve the record count within an array.
Description
This operation is employed to retrieve the record count within an array. It involves specifying the key name associated with an array value within the provided data.
Number of Parameters : 1
The ARRAY COUNT operation requires 1 parameter.
Parameter : Array Key Name
Provide the key name where its value is array, so that we’ll get the count of the array.
Below is an example where we will get the length of the array of key “data”.
['bizdata_dataset_response']['data']
ENCODE DECODE
This operation facilitates the encoding or decoding of data by inputting the desired key name for encoding or decoding purposes.
Description
This operation facilitates the encoding or decoding of data by inputting the desired key name for encoding or decoding purposes.
Number of Parameters : 3
The ENCODE DECODE operation requires 3 parameters.
Parameter : Response Key
Pass the key name which holds the data that we want to encode or decode.
Below is the example how we can pass the response key.
Parameter : Method Type
Give the type in which you want to encode/decode your data.
Below is the example how we can pass the Method Type utf-8 or utf-8-sig or latin1.
utf-8
Parameter : Process
Pass the process type based on the requirement. Possible value for this key is encode or decode.
Below is the example how we can pass the Process.
encode or decode
RAW SENTENCE GENERATOR
The Raw Sentence Generator operation transforms structured JSON data into Raw sentence.
Description
The Raw Sentence Generator operation transforms your structured JSON data into Raw sentence.
Number of Parameters : 3
The RAW SENTENCE GENERATOR operation requires 3 parameters.
Parameter : Singleline Key
Provide the key name that contains the single-line data, which we aim to utilize for generating the raw sentence.
Below is an example where we provide the data key, containing single-line data.
data
Parameter : Include Keys
Pass the key names separated by commas and enclosed in double quotes for which the raw sentence is to be generated. If all key names are to be included for raw sentence generation, leave this parameter empty.
Below is an example where we specify the key names Name and Commands for generating the sentence.
"Name","Commands"
Parameter: Raw Response Key
Specify the key name where you want to store the generated raw sentence.
Below is an example where we aim to store the generated raw sentence in the Response key.
Response
Various Use Cases for the Parameters
Case 1
When it’s necessary to incorporate all upcoming keys in sentence generation.
Singleline Key Empty
Include Keys Empty
Raw Response Key
Response
Case 2
When not all upcoming keys are needed for sentence generation, only specific keys are required.
Singleline Key Empty
Include Keys
"Name","Commands"
Raw Response Key
Response
Case 3
When aiming to generate a Raw Sentence from all dictionaries within the Singleline key, the Include Keys parameter remains empty each time. This facilitates the creation of a raw sentence containing all key-value pairs from the dictionaries inside the Singleline key.
Singleline Key
data
Include Keys
Raw Response Key
Response
Example for Case 3 Input JSON
{
"data": [
{
"id": 1,
"email": "george.bluth@reqres.in",
"first_name": "George",
"last_name": "Bluth",
"avatar": "https://reqres.in/img/faces/1-image.jpg"
},
{
"id": 2,
"email": "janet.weaver@reqres.in",
"first_name": "Janet",
"last_name": "Weaver",
"avatar": "https://reqres.in/img/faces/2-image.jpg"
},
{
"id": 3,
"email": "emma.wong@reqres.in",
"first_name": "Emma",
"last_name": "Wong",
"avatar": "https://reqres.in/img/faces/3-image.jpg"
}
]
}
Result
"raw_response":"The id is 1, email is george.bluth@reqres.in, first_name is George, last_name is Bluth, avatar is https://reqres.in/img/faces/1-image.jpg, id is 2, email is janet.weaver@reqres.in, first_name is Janet, last_name is Weaver, avatar is https://reqres.in/img/faces/2-image.jpg, id is 3, email is emma.wong@reqres.in, first_name is Emma, last_name is Wong, avatar is https://reqres.in/img/faces/3-image.jpg"
TIME UNITS
This operation is designed to extract important information like the year, month, and date from timestamps in supported formats.
Description
It analyzes the timestamps you give it and gives back structured data, including the year, month, date, and sometimes more details based on the timestamp you provided.
Number of Parameters : 2
The TIME UNITS operation requires 2 parameters.
Parameter : Date Timestamp Key
Provide the Key Name which holds the timestamp value.
Below is an example where we provide the timestamp key, containing timestamp value.
timestamp
Parameter : Time Units
Provide the key name for saving data can be left empty. For user-defined fields, provide the key names in double quotes separated by commas.
Below is an example where we provide a comma-separated list of keys enclosed in double quotes for user-defined fields.
"year","month","day","hour","minute","second","microsecond"
Note
If the user doesn’t input any Time Units, the operation dynamically generates seven fields.
Data Chunking
The Data Chunking operation is used to split text data into smaller chunks of data based on the specified chunk size.
Description
This operation helps divide large text into smaller, more manageable pieces for downstream use.
Number of Parameters :
The parameters available for the Data Chunking operation vary depending on the selected chunk type.
Parameter : Process Key
Provide the key name that holds the data for which we need to create the chunks.
Below is an example where we provide the text key, containing the data for which we need to create the chunks. Please note, the user must input the key enclosed in square brackets and single quotes, like [‘xxx’].
['text']
Parameter : Chunk Type
Token Chunker
This option splits the text into chunks based on token and chunk size.
Sentence Chunker
This option splits the text into chunks based on sentences and chunk size.
Sentence Chunker’s Parameter: Chunk Overlap
This controls how much content is shared between two neighboring chunks. A small overlap helps keep context so chunks don’t feel disconnected.
Pattern Chunker
This option splits the text into chunks based on specific characters or symbols (like ., ?, or line breaks \n).
It uses these special characters as markers to decide where one chunk ends and the next begins.
Pattern Delimiters
These are the characters or symbols used to split your text into smaller parts.
For example, you can use things like a full stop (.), question mark (?), or a new line (\n) to tell the system where one chunk should end and the next should begin.
Page Chunker
This option splits the text into chunks based on pages. Each chunk contains content from one or more pages of the document.
Page Name
This is the name of the file you want to process and split into chunks.
Page Page Per Chunk
This defines how many pages should be included in each chunk.
Sliding Window Chunker
This option splits the text into chunks using a “sliding window” approach. Each chunk is created by combining content from the current page along with nearby pages, so the context is preserved.
Sliding Window Chunker’s Parameter: File Name
This is the name of the file you want to process and split into chunks.
Sliding Window Chunker’s Parameter: Slide Window
This defines how many pages to include before and after the current page when creating a chunk.
Parameter : Chunk Key
Provide the key name that will hold the chunked data.
Below is an example where we specify the chunks key for the Chunk Key parameter. This key will be used to store the individual chunks of data that are created from the original text.
chunks
Parameter : Token Size
Provide the size of your chunk.
Below is an example where we provide the value 1000. This setting will create chunks of text, with each chunk being up to 1000 characters long.
1000
Parameter : Group By Chunk
Provide the key name that will hold all the chunks made from the data.
Below is an example where we specify the DataChunks key for the Group By Chunk parameter. This key will be used to aggregate and store all the chunks created from the original data.
DataChunks
Extract to Array Operation
This operation helps you take a list of items like documents, users, or records and split their important parts like IDs and texts into separate labelled groups called arrays.
Description
This makes the data easier to use in next operations.
Parameters
Number of Parameters : 3
Parameter : listobjectkey
This key tells the operation where to find the list of items within your input data.
In the example below listobjectkey tells the operation where to look for the list of items inside data.
data
Parameter : extractionkey
This setting tells the operation which specific fields to extract from each item in the list.
id, content
Means: From every object, take the id and the content.
Parameter : arraycollectionkey
This tells the operation what to call the groups where we store the extracted values.
ids, documents
Example
{
"data": [
{ "id": 1, "content": "text1" },
{ "id": 2, "content": "text2" },
{ "id": 3, "content": "text3" }
]
}
Final Result
{
"ids":, [ppl-ai-file-upload.s3.amazonaws](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/103219118/da55f0f4-87c3-4908-972a-d715b4b5b55f/Data-Transformation-document.docx)
"documents": ["text1", "text2", "text3"]
}
HTML Extractor
The HTML Extractor operation extracts textual and structured data from given HTML content.
Description
This operation extracts information from raw HTML content and stores the output in a specified key.
Number of Parameters : 2
The HTML Extractor operation requires 2 parameters.
Parameter : Input HTML Key
Provide the key name that contains the raw HTML data we aim to extract information from.
Below is an example where we provide the key containing HTML content.
bizdata_dataset_response
Parameter : Output Data Key
Specify the key name where you want to store the extracted structured data.
Below is an example where we aim to store the extracted data in the htmltextdata key.
htmltextdata
Various Use Cases for the Parameters
Case 1
When it’s necessary to extract plain text content from an HTML string.
Input HTML Key
bizdata_dataset_response
Output Data Key
htmltextdata
Example for Case 1 Input JSON
{
"bizdata_dataset_response": "Hello World This is a paragraph."
}
Result
{
"htmltextdata": "Hello World This is a paragraph."
}
File Extractor
The File Extractor operation extracts textual data from various file formats such as txt, docs, ppt, pdf, and many others.
Description
This operation is used when a file is represented as bytes and the text content needs to be extracted.
Number of Parameters : 1
The File Extractor operation requires 1 parameter.
Parameter : File Data Key
Provide the key name that contains the bytes data of the file to be extracted.
Below is an example where we provide the key containing the file data.
bizdata_dataset_response
Various Use Cases for the Parameters
Case 1
When you have a document represented as bytes and want to pull out its text content.
File Data Key
bizdata_dataset_response
Example for Case 1 Input JSON
{
"bizdata_dataset_response": "bPDF-1.1\n1 0 obj\nHello , This is a File Extractor ops.\nTj\nET\nEOF"
}
Result
{
"extractedtext": "Hello , This is a File Extractor ops.",
"extractedImages": [],
"extractedtables": []
}
JSON to Avro
The JSON to AVRO operation converts your structured JSON data into the AVRO format using a specified valid schema.
Description
This operation validates and parses structured JSON data into AVRO bytes using the provided schema.
Number of Parameters : 3
The JSON to AVRO operation requires 3 parameters.
Parameter : JSON Data Key
Provide the key name that contains the JSON structured data which we aim to convert into the AVRO format.
Below is an example where we provide the data key containing the JSON data.
bizdata_dataset_response
Parameter : AVRO Schema
Provide the AVRO schema in JSON format that will be used to validate and parse the JSON data into AVRO bytes.
Below is an example where we specify the schema.
{
"type": "record",
"name": "User",
"fields": [
{ "name": "name", "type": "string" },
{ "name": "age", "type": "int" }
]
}
Parameter : AVRO Data Key
Specify the key name where you want to store the converted AVRO byte data.
Below is an example where we aim to store the converted data in the avrodatakey.
avrodatakey
Various Use Cases for the Parameters
Case 1
When you have a simple JSON record and a matching AVRO schema and want to serialize it.
JSON Data Key
bizdata_dataset_response
AVRO Schema
{
"type": "record",
"name": "Customer",
"fields": [
{ "name": "CREATEDDATE", "type": "string" },
{ "name": "CUSTOMERCITY", "type": "string" },
{ "name": "CUSTOMERCOUNTRY", "type": "string" },
{ "name": "CUSTOMEREMAIL", "type": "string" },
{ "name": "CUSTOMERNAME", "type": "string" },
{ "name": "CUSTOMERPHONE", "type": "string" },
{ "name": "CUSTOMERSTATE", "type": "string" },
{ "name": "CUSTOMERZIPCODE", "type": "string" },
{ "name": "ERPCUSTOMER", "type": "string" },
{ "name": "CUSTOMERID", "type": "string" },
{ "name": "ID", "type": "string" }
]
}
AVRO Data Key
avrodatakey
Example for Case 1 Input JSON
{
"bizdata_dataset_response": {
"CREATEDDATE": "15-01-2024",
"CUSTOMERCITY": "Bengaluru",
"CUSTOMERCOUNTRY": "India",
"CUSTOMEREMAIL": "john.doe@email.com",
"CUSTOMERNAME": "John Doe",
"CUSTOMERPHONE": "9876543210",
"CUSTOMERSTATE": "Karnataka",
"CUSTOMERZIPCODE": "560001",
"ERPCUSTOMER": "ERP001",
"CUSTOMERID": "CUST001",
"ID": "1"
}
}
Result
{
"avrodatakey": "bObj..."
}
Avro to JSON
The AVRO to JSON operation converts AVRO formatted byte data back into structured JSON data.
Description
This operation parses AVRO byte data and converts it into readable JSON output.
Number of Parameters : 2
The Avro to JSON operation requires 2 parameters.
Parameter : AVRO Data Key
Provide the key name that contains the AVRO byte data which we aim to parse and convert into JSON.
Below is an example where we provide the key containing AVRO data.
avrodatakey
Parameter : JSON Data Key
Specify the key name where you want to store the parsed and converted JSON data.
Below is an example where we aim to store the parsed data.
jsondataresponse
Various Use Cases for the Parameters
Case 1
When you have valid AVRO bytes and need to convert them into a readable JSON object.
AVRO Data Key
avrodatakey
JSON Data Key
jsondataresponse
Example for Case 1 Input JSON
{
"avrodatakey": "bObj..."
}
Result
{
"jsondataresponse": {
"CREATEDDATE": "15-01-2024",
"CUSTOMERCITY": "Bengaluru",
"CUSTOMERCOUNTRY": "India",
"CUSTOMEREMAIL": "john.doe@email.com",
"CUSTOMERNAME": "John Doe",
"CUSTOMERPHONE": "9876543210",
"CUSTOMERSTATE": "Karnataka",
"CUSTOMERZIPCODE": "560001",
"ERPCUSTOMER": "ERP001",
"CUSTOMERID": "CUST001",
"ID": "1"
}
}
Zipfile in Base64
Zipfile in Base64 produces the ultimate base64 encoded string of a zip file.
Number of Parameters: 5
Overview
This operation is used to package multiple files and return an encoded zip file string.
Parameter: Source Key
This key contains all the records.
In the example below, “items” serves as the Source Key.
"items"
Parameter: File Name Key
This key holds the file name.
In the given example, “FILE_NAME” will serve as the key for the File Name Key.
"FILE_NAME"
Parameter: File Extension Key
This key contains the file extension.
In the example below, “EXTENSION” will act as the key for the file extension.
"EXTENSION"
Parameter: File Data Key
This key contains the file’s data.
In the example below, “FILE_DATA” is designated as the key for the file’s data.
"FILE_DATA"
Parameter: Base64 Response Key
This key holds the ultimate base64 encoded string of a zip file.
In the example below, “File_string” is designated as the key for the Base64 Response Key.
"File_string"
Example
Input = {"data ": {"items": [{"FILE_NAME": "file_01", "EXTENSION": ".csv", "FILE_DATA": "bnIsdGVzdGluZyxvcHMNCjEsZmlsZTEsemlwb3BzDQoyLGZpbGUxLHppcG9wcw0KMyxmaWxlMSx6aXBvcHM="},{"FILE_NAME": "file_02", "EXTENSION": ".tsv", "FILE_DATA": "bnIJdGVzdGluZwlvcHMNCjEJZmlsZTIJemlwb3BzDQoyCWZpbGUyCXppcG9wcw0KMwlmaWxlMgl6aXBvcHM="},{"FILE_NAME": "file_03", "EXTENSION": ".psv", "FILE_DATA": "bnJ8dGVzdGluZ3xvcHMNCjF8ZmlsZTJ8emlwb3BzDQoyfGZpbGUyfHppcG9wcw0KM3xmaWxlMnx6aXBvcHM="}]}} Output = ["File_string": "Encoded zip file string"].
Troubleshooting
- Validate key names before execution.
- Ensure correct datatype formats during conversion.
- Test transformations using sample datasets.
Frequently Asked Questions
What does the Upper Case operation do?
The Upper Case operation converts the values of specified keys into uppercase format while the data is in-flight within the pipeline.
What does the Lower Case operation do?
The Lower Case operation converts the values of selected keys into lowercase format during data processing.
How does the Data Type operation work?
The Data Type operation converts string values into their respective data types such as Boolean, Float, Integer, or DateTime based on the provided configuration.
When should I use the Append operation?
Use the Append operation when you need to add new keys with static values or dynamic values derived from existing pipeline data.
What is the purpose of Title Case?
Title Case converts the values of specified keys so that each word starts with an uppercase letter.
How does Data Extractor help?
Data Extractor retrieves specific keys and their corresponding values from a JSON response for further processing.
What is the difference between JSON to String and String to JSON?
JSON to String converts structured JSON data into a string format, while String to JSON parses a string and converts it into structured JSON.
When should JSON to XML or XML to JSON be used?
These operations are used when converting data between JSON and XML formats to meet integration or system requirements.
Notes
- Validate key names before execution.
- Ensure correct datatype formats during conversion.
- Test transformations using sample datasets.