Web Crawler API
The Web Crawler API enables efficient web scraping and data extraction from websites.
It supports Markdown, JSON, and cleaned HTML output formats, making it ideal for automation and data processing workflows.
Method
POST
Endpoint
{{base_url}}/webcrawl
Authentication
The following parameters and headers are required for authentication:
Required Params
- client_id — Your API authentication ID.
Required Headers
- client-secret — Your API authentication secret.
- Content-Type: application/json
Input Formats
The API accepts JSON input containing crawl configuration settings.
Basic Crawl (Markdown Output, Default)
{
"url": "https://example.com",
"deep_crawl": "false",
"max_pages": 10
}
Deep Crawl
{
"url": "https://example.com",
"deep_crawl": "true",
"max_pages": 10
}
Deep Crawl (All Pages)
{
"url": "https://example.com",
"deep_crawl": "true",
"max_pages": "all"
}
Features of Web Crawler API
- Fast and efficient crawling that outperforms many paid services.
- Flexible output formats: JSON, cleaned HTML, Markdown.
- Media extraction: Detects images, audio, and video tags.
- Link extraction: Captures external and internal page links.
- Metadata extraction: Retrieves structured metadata.
- Multi-URL crawling capability for complex workflows.
Notes
- Basic Crawl: Only fetches the provided URL (single page).
- Deep Crawl: Crawls multiple pages up to the
max_pageslimit. - If
max_pagesis set to “all” (case-insensitive),
anddeep_crawlis “true”, the crawler retrieves all reachable pages.
Supported Output Formats
- JSON
- Markdown
- Cleaned HTML
Authentication Instructions
To acquire your Base URL, Client ID, and Client Secret,
please visit the My Profile section inside your eZintegrations account.
