Feature · Site intelligence
Embeddings cluster your site by what each page is actually about.
Once we've crawled your pages, every URL gets a vector embedding (OpenAI text-embedding-3-small). Cosine similarity then groups pages into semantic clusters — the basis for thin-cluster, cannibalization, and gap-fill detection.
What
What it does
Page clustering is the invisible engine behind a lot of Slope's opportunity rules. Each crawled page gets a 1536-dim embedding stored as a plain Mongo array (deliberately no Atlas Vector Search or other specialized infra — keeps cost + complexity low). A Python clustering worker runs cosine similarity to group pages into topic clusters. Those clusters drive thin-cluster detection (you have one page where competitors have 5), cannibalization (two pages too close in topic), and content-gap suggestions.
Why it matters
The job to be done
- Manual cluster maintenance is one of the highest-toil SEO tasks. We do it continuously, in the background, with no spreadsheets.
- Cannibalization is the most under-detected SEO problem — two of your URLs competing for the same query lose to a competitor with one focused page. Slope surfaces these by name, with both URLs cited.
- Thin clusters become explicit: 'you have 2 pages on tax optimization, your top competitor has 11'. Each cluster shows the gap in plain numbers.
- URL pattern clustering (separate from embedding clustering) groups pages by template — /products/[slug], /blog/[slug] — so we know which clusters are programmatic vs editorial.
- Embeddings are also reused for semantic search and 'related pages' suggestions when you create internal-linking opportunities.
How
What happens under the hood
- 1
Embed every page
A compute_embeddings worker batches up to 25 unembedded pages per run and calls OpenAI text-embedding-3-small. The 1536-float vector is stored as a plain array on the page document.
- 2
Cluster with cosine similarity
cluster_pages worker uses numpy to compute pairwise cosine similarity and group pages above a threshold (default 0.78). Clusters are persisted to page_clusters with cluster_type=semantic.
- 3
URL pattern clustering (parallel)
A separate clustering pass examines URL paths and groups by template regex — /products/[a-z-]+ etc. — so we know which clusters are templated content.
- 4
Drive opportunity rules
Rule R4 (thin_cluster) and R5 (cannibalization) run against these clusters. Each opportunity references the cluster_id so the UI can show all affected URLs in one place.
Related features
Start finding opportunities tonight.
Connect your domain, GSC, and GitHub. First detection runs within minutes. Free tier requires no card.