DATA

SiteCrawler

Automatic site sweep that kicks off every new account

Enter a URL, watch the crawl run in real time, get a structured business profile with logo, description, up to 30 products and services — all piped into the rest of the platform automatically.

  • 1 URL Input to fully seed an account
  • 30 Products / services captured per scan
  • Every Output language supported, RTL-aware

SiteCrawler answers the very first question a new customer has: "how do I tell this platform about my business?". The answer is: you don't. You give us a URL.

The crawl starts on the homepage. Metadata is parsed, the logo located, the business name extracted, contact info captured. Then the crawler navigates to "services", "products", "about", or their equivalents and harvests up to thirty items with names, prices, and images.

Progress streams live. The user sees stages run in real time, which kills the anxiety of "is anything happening?" and builds trust before the first interaction. If the site sits behind bot protection, the crawler has specialised mechanics to politely negotiate around it.

Parsing is not a regex job. A modern AI model handles content separation — distinguishing real copy from navigation, ads, cookie banners, and boilerplate. The output is therefore clean and usable, not a dump of HTML noise.

The artefact is a polished business card: name, description, logo, full service and product lists. The owner can open it, share the link, hand it to a partner. In parallel, the crawler pushes the same data into SetupWizard, so the user never has to copy anything manually.

Every language AIM supports is covered. The output card is emitted in the language of the source site, and if that site is written in an RTL script the card is rendered right-to-left automatically. The crawler hub lists every scan the user has run, with timestamps and actions — including re-crawl, which refreshes the business profiles of every chat the user owns.

Admins can see every scan across the platform to help users debug issues. A JSON export is available for anyone who wants to take the data elsewhere. SiteCrawler is built to be usable on its own, but it shines when connected: every new user's journey on the platform begins with a single URL here.

Capabilities

Everything SiteCrawler handles for you

  • Single-field input

    Paste a URL; everything else runs automatically.

  • Structured extraction

    Business card, services, up to 30 products with names, prices, and images.

  • Live progress stream

    Stages visible in real time — the user knows exactly what is happening.

  • Bot-protection aware

    Specialised mechanics for politely handling sites that sit behind anti-bot layers.

  • AI-driven content parsing

    Nav, ads, and boilerplate filtered out; only real copy gets stored.

  • Multilingual output

    Result card in the language of the source site — every language AIM supports, RTL-aware.

  • Re-crawl refreshes chats

    Running a scan again updates every chat's business profile at once.

Integrations

The starting gun for every new account

SiteCrawler is the source of structured data most of the platform downstream depends on. Its output flows into SetupWizard, AccountManager, PromtBuilder, KnowledgeBase, and ProductCatalog.

  • SetupWizard receives the crawl payload and builds the canonical per-chat profile.
  • AccountManager renders the crawl progress and business card live inside the owner dashboard.
  • PromtBuilder uses the latest crawl to seed draft system prompts.
  • KnowledgeBase the detected business type decides which industry pack agents draw from.
  • ProductCatalog the products found in the scan can be imported straight into a catalog.

Wire SiteCrawler into your product today

Book a consultation with our founders and we'll walk you through the whole microservice stack — not just this one — live on your domain.