Use this endpoint to ingest content by scraping a public web page. It fetches the page, extracts text, and creates a source in your project for downstream processing.

Endpoint Overview

Authentication

This endpoint requires authentication using an API token. Include your API token as a Bearer token in the Authorization header.
Learn how to create and manage API tokens in the API Tokens guide.

Request Format

Headers

HeaderValueRequired
AuthorizationBearer YOUR_API_TOKEN✅ Yes
Content-Typeapplication/json✅ Yes

Request Body

Send a JSON body with the following fields:
FieldTypeDescriptionRequired
urlstringThe URL of the web page to scrape✅ Yes
crawlUrlsbooleanWhether to crawl and ingest links from the given URLNo (default: false)

URL Requirements

This endpoint scrapes web pages. To ingest files (PDF, DOCX, etc.), use the Upload File endpoint.

Response Format

Success Response (200 OK)

{
  "status": "Processing",
  "message": "Source processed successfully",
  "file_name": "https://example.com/",
  "file_size": 0,
  "file_type": "",
  "file_source": "url",
  "project_id": "550e8400-e29b-41d4-a716-446655440000",
  "project_name": "My Project",
  "partition_method": "basic"
}

Response Fields

FieldTypeDescription
statusstringProcessing status (New, Processing, Completed, Failed, etc.)
messagestringHuman-readable status message
file_namestringName or URL for the ingested source
file_sizeintegerSize in bytes (0 for URL-based initial record)
file_typestringDetected type when applicable
file_sourcestringSource type (url)
project_idstringUUID of the project
project_namestringName of the project
partition_methodstringDocument processing method used

Code Examples

JavaScript/Node.js

import fetch from 'node-fetch';

const uploadUrlSource = async (apiToken, url, crawlUrls = false) => {
  const response = await fetch('https://sources.graphorlm.com/upload-url-source', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url, crawlUrls })
  });

  if (!response.ok) {
    throw new Error(`Upload from URL failed: ${response.status} ${response.statusText}`);
  }

  const result = await response.json();
  console.log('URL upload accepted:', result);
  return result;
};

// Usage (scrapes the page content)
uploadUrlSource('grlm_your_api_token_here', 'https://example.com/');

Python

import requests

def upload_url_source(api_token, url, crawl_urls=False):
    endpoint = "https://sources.graphorlm.com/upload-url-source"
    headers = {"Authorization": f"Bearer {api_token}", "Content-Type": "application/json"}
    payload = {"url": url, "crawlUrls": crawl_urls}

    response = requests.post(endpoint, headers=headers, json=payload, timeout=300)
    response.raise_for_status()
    return response.json()

# Usage (scrapes the page content)
result = upload_url_source("grlm_your_api_token_here", "https://example.com")
print("URL scraping accepted:", result["file_name"])  # typically echoes the URL

cURL

curl -X POST https://sources.graphorlm.com/upload-url-source \
  -H "Authorization: Bearer grlm_your_api_token_here" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/","crawlUrls":false}'

Error Responses

Common Error Codes

Status CodeError TypeDescription
400Bad RequestInvalid or missing URL, malformed JSON
401UnauthorizedInvalid or missing API token
403ForbiddenAccess denied to the specified project
404Not FoundProject or source not found
500Internal Server ErrorError during URL processing

Error Response Format

{
  "detail": "Invalid input: URL is required"
}

Error Examples

Document Processing

After a successful request, GraphorLM begins fetching and scraping the web page in the background.

Processing Stages

  1. URL Accepted - The request is validated and scheduled
  2. Content Retrieval - The page is fetched over HTTPS
  3. Text Extraction - Visible text is extracted and normalized
  4. Structure Recognition - Document elements are identified and classified
  5. Ready for Use - Document is available for chunking and retrieval

Processing Methods

The system selects the optimal processing method based on the detected content. You can reprocess with a different method after ingestion.
You can reprocess sources using the Process Source endpoint after ingestion.

Best Practices

  • Provide reachable URLs: Ensure the page is publicly accessible over HTTPS
  • Disable crawling when unneeded: Set crawlUrls to false to ingest only the provided URL
  • Respect site policies: Only scrape pages you are permitted to and consider website rate limits
  • Retry logic: Implement retries for transient network issues

Next Steps