The hypothesis is that Google rewards websites with greater visibility if their textual content is relevant and informative to the end user. This process aims to discover SEO gaps resulting from texts that can have negative effects on organic positioning.
The content audit tool is project I started while working in Pro Web Consulting to help the SEO team identify and correct text data on clients' websites that could negatively affect their organic position on the SERP (search engine result pages).
The tool was written in Python and it was based on the following logic:
- scrape the clients' assets
- store the data in a database
- clean the data
- apply TF-IDF vectorization to the texts
- apply K-Means clustering
- find anomalies in the clusters
- perform topic modeling with LDA
The client would receive a report that documented the findings and action items on how to resolve the issues.
This project served more than 15 customers through an automized pipeline.
Goal of the project 🎯
The goal is to two-fold:
- aid clients optimize their content creation efforts
- support the SEO team in the identification, clustering and optimization process of texts
- Backend developer (me)
- Data scientist (me)
- SEO specialist * 2
- Project manager
The software offers the following features
- scraping capabilities with inlink / backlink following
- data cleaning
- creation on data visualizations for .pptx embedding
- exporting of comprehensive report for the client