How to Connect Data Lake as Source

Overview

The Data Lake in eZintegrations is a search engine–based NoSQL database designed to store and process massive volumes of structured and unstructured data for analytics, storage, machine learning, and deep learning.

A Data Source in eZintegrations acts as a connection pool that retrieves data from the Data Lake and delivers it in JSON format to the integration pipeline.

Responses from the Data Lake source are stored under the bizdata_dataset_response key for further processing.

When to Use

Use Data Lake as a Source when large-scale analytical or operational data needs to be retrieved and processed within an Integration Bridge.

Extracting analytical datasets
Processing historical records
Streaming operational data
Supporting reporting workflows
Feeding machine learning pipelines

How It Works

The Data Lake Source retrieves records using JSON-based queries.

Data is streamed in chunks based on the configured size and pagination settings.

Retrieved records are stored in the bizdata_dataset_response key and passed to downstream operations and targets.

When using Single Line to Multiline Operations, the Chop key must be set to:

[‘bizdata_dataset_response’]

Data Lake Source Parameters

Data Lake Version

Specifies the Data Lake name and version assigned to the organization.

Index / Table Name

Defines the index or table from which data is retrieved.

Available indices and tables can be found in the Datalake section of the Visualization product.

Pagination Wait Time

Controls how long the system waits for the next page of data.

Default: 2m (2 minutes)
Supports: m (minutes), h (hours), s (seconds)
Increase for large responses or high network congestion

Timeout

Defines the maximum wait time for receiving a response.

Default: 2m
Increase when Data Lake response is slow
May be required for small cluster sizes

Size

Controls the number of records streamed per batch.

Default: 1000
Maximum: 10000
Recommended: 1000 for optimal performance

Query

Defines the JSON-based query used to retrieve records from the Data Lake.

Query Examples

Get All Records

{
  "query": {
    "match_all": {}
  }
}

Get Specific Columns

{
  "_source": ["store_number", "customer_number"],
  "query": {
    "match_all": {}
  }
}

Filter by Field Value

{
  "query": {
    "match": {
      "employee_id": 130
    }
  },
  "_source": {
    "includes": ["employee_id", "employee_name"]
  }
}

Filter with Multiple Conditions

{
  "size": 50,
  "sort": [{}],
  "_source": ["Project", "title", "Assigned To", "Priority", "Created By", "createdDateTime", "dueDateTime"],
  "query": {
    "bool": {
      "must": [
        { "query_string": { "query": "*" }},
        { "query_string": { "query": "Project:\"Project ABC\" AND Priority:[* TO *] AND NOT percentComplete:100" }},
        { "bool": { "should": [] }}
      ],
      "must_not": []
    }
  }
}

Key Names with Spaces

{
  "size": 1000,
  "sort": [{}],
  "_source": ["ThreadId", "Ticket Created At"],
  "query": {
    "bool": {
      "must": [
        { "query_string": { "query": "*" }},
        { "query_string": { "query": "NOT Status:\"Closed\" AND Thread\\ Type: \"create\"" }},
        { "bool": { "should": [] }}
      ],
      "must_not": []
    }
  }
}

Check for NULL Values

{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "asn"
        }
      }
    }
  }
}

Dynamic Filter Using Sprintf

{
  "query": {
    "bool": {
      "must": [
        { "term": { "ipAddress": "{%ipAddress%}" }}
      ],
      "must_not": [
        { "exists": { "field": "asn" }}
      ]
    }
  }
}

Limit and Sort Results

{
  "_source": ["asn", "as"],
  "query": {
    "bool": {
      "must": [
        { "term": { "ipAddress": "{%ipAddress%}" }},
        { "exists": { "field": "as" }}
      ]
    }
  },
  "size": 1,
  "terminate_after": 1,
  "sort": [
    {
      "_doc": { "order": "asc" }
    }
  ]
}

Frequently Asked Questions

What is Data Lake Source in eZintegrations?

It is a source connector that retrieves structured and unstructured data from the Goldfinch Analytics Data Lake.

Where is the response stored?

All retrieved data is stored under the bizdata_dataset_response key.

What is the recommended batch size?

The recommended size is 1000 records for balanced performance and reliability.

Can I use dynamic values in queries?

Yes. Dynamic values can be passed using Sprintf placeholders.

When should I increase timeout and pagination time?

Increase these values when working with large datasets, slow networks, or small cluster sizes.

Notes

Always validate queries before production deployment.
Use secure filters to avoid unnecessary data loads.
Optimize size and pagination for performance.
Monitor cluster capacity for large workloads.
Maintain consistent query structures across integrations.

Updated on February 20, 2026

What are your Feelings

Happy
Normal
Sad

eZintegrations

Goldfinch AI

Bizdata API

Goldfinch Analytics

How to Connect Data Lake as Source

Overview

When to Use

How It Works

Data Lake Source Parameters

Data Lake Version

Index / Table Name

Timeout

Size

Query

Query Examples

Get All Records

Get Specific Columns

Filter by Field Value

Filter with Multiple Conditions

Key Names with Spaces

Check for NULL Values

Dynamic Filter Using Sprintf

Limit and Sort Results

Frequently Asked Questions

What is Data Lake Source in eZintegrations?

Where is the response stored?

What is the recommended batch size?

Can I use dynamic values in queries?

Notes

Overview

When to Use

How It Works

Data Lake Source Parameters

Data Lake Version

Index / Table Name

Pagination Wait Time

Timeout

Size

Query

Query Examples

Get All Records

Get Specific Columns

Filter by Field Value

Filter with Multiple Conditions

Key Names with Spaces

Check for NULL Values

Dynamic Filter Using Sprintf

Limit and Sort Results

Frequently Asked Questions

What is Data Lake Source in eZintegrations?

Where is the response stored?

What is the recommended batch size?

Can I use dynamic values in queries?

When should I increase timeout and pagination time?

Notes

Share This Article :