Endpoint Overview
HTTP Method
POST
Endpoint URL
Authentication
This endpoint requires authentication using an API token. Include your API token as a Bearer token in the Authorization header.Learn how to create and manage API tokens in the API Tokens guide.
Request Format
Headers
Header | Value | Required |
---|---|---|
Authorization | Bearer YOUR_API_TOKEN | ✅ Yes |
Content-Type | application/json | ✅ Yes |
Request Body
Send a JSON body with the following fields:Field | Type | Description | Required |
---|---|---|---|
url | string | The URL of the web page to scrape | ✅ Yes |
crawlUrls | boolean | Whether to crawl and ingest links from the given URL | No (default: false ) |
URL Requirements
Supported URL types
Supported URL types
- Public web pages
- Pages that render primary content server-side and are reachable without interaction
Access requirements
Access requirements
- The URL must be publicly reachable over HTTPS
- Authentication-protected pages are not supported by this endpoint
This endpoint scrapes web pages. To ingest files (PDF, DOCX, etc.), use the Upload File endpoint.
Response Format
Success Response (200 OK)
Response Fields
Field | Type | Description |
---|---|---|
status | string | Processing status (New , Processing , Completed , Failed , etc.) |
message | string | Human-readable status message |
file_name | string | Name or URL for the ingested source |
file_size | integer | Size in bytes (0 for URL-based initial record) |
file_type | string | Detected type when applicable |
file_source | string | Source type (url ) |
project_id | string | UUID of the project |
project_name | string | Name of the project |
partition_method | string | Document processing method used |
Code Examples
JavaScript/Node.js
Python
cURL
Error Responses
Common Error Codes
Status Code | Error Type | Description |
---|---|---|
400 | Bad Request | Invalid or missing URL, malformed JSON |
401 | Unauthorized | Invalid or missing API token |
403 | Forbidden | Access denied to the specified project |
404 | Not Found | Project or source not found |
500 | Internal Server Error | Error during URL processing |
Error Response Format
Error Examples
Invalid URL (400)
Invalid URL (400)
Invalid API Token (401)
Invalid API Token (401)
Document Processing
After a successful request, GraphorLM begins fetching and scraping the web page in the background.Processing Stages
- URL Accepted - The request is validated and scheduled
- Content Retrieval - The page is fetched over HTTPS
- Text Extraction - Visible text is extracted and normalized
- Structure Recognition - Document elements are identified and classified
- Ready for Use - Document is available for chunking and retrieval
Processing Methods
The system selects the optimal processing method based on the detected content. You can reprocess with a different method after ingestion.You can reprocess sources using the Process Source endpoint after ingestion.
Best Practices
- Provide reachable URLs: Ensure the page is publicly accessible over HTTPS
- Disable crawling when unneeded: Set
crawlUrls
tofalse
to ingest only the provided URL - Respect site policies: Only scrape pages you are permitted to and consider website rate limits
- Retry logic: Implement retries for transient network issues
Next Steps
Process Source
Reprocess with different parsing methods for optimal results
List Sources
Retrieve information about all uploaded documents in your project
Upload File
Ingest local files (PDF, DOCX, etc.)
Chunking
Learn how to optimize document segmentation for your RAG pipeline
Delete Source
Remove documents that are no longer needed from your project