
The role of domain names and URLs in cyber attacks
In the world of web filtering, there is a strategic link that is little known but which nevertheless determines the quality of a security policy: the website classification database. At Olfeo, Andrea Bassani and his team are responsible for this essential task. He gives us an insight into a little-known but crucial department, whose rigor and methodological choices are what make Olfeo's offering so unique today.
1 – Andrea, what is your role at Olfeo and that of your team?
I have been with Olfeo for 11 years and now head up the Classification Department, a team entirely dedicated to analyzing the content of websites and SaaS applications. Our mission is to classify each piece of content according to its nature and level of risk, to feed the database used by both our SSE solution and our OEM offering forcybersecurity and network publishers.
Our work is based on a subtle balance between automated analysis (algorithms, scoring, semantic detection) and manual human verification. This dual approach is essential: AI can identify trends and probabilities, but only human analysis can resolve ambiguous cases, cultural contexts, or local legal subtleties.
Every day, our team processes hundreds of websites, many of which are difficult to classify due to content that is hard to view.
It's a demanding job that requires rigor, constant vigilance, and curiosity. We explore entire sections of the web, some of which are little known, and learn a lot about digital usage around the world. One amusing example is a comparison between certain bank websites, such as those in Japan, where the graphic illustrations on the sites are often in line with the country's culture (manga, kawaii, etc.).
2 – What makes the Olfeo database unique compared to those of other market players?
Most URL database providers claim to have huge volumes, with hundreds of millions of domains. But the size of a database is far from being the only guarantee of quality. What matters is the relevance of the ranking, the consistency of the decision rules, and the ability to correctly recognize the sites that are actually visited.
At Olfeo, we have chosen to prioritize the most visited websites in the geographical areas where we operate. As a result, we have a recognition rate of 99.7%. This means that, in almost all cases, when a user accesses a website, it is already classified in our database. And this is without reaching the volumes of websites on which alternative providers communicate.
Similarly, we focus on accurately categorizing websites. If the classification is incorrect, it is like an error in a machine learning system: it leads to biased decisions, compromises access policies, and undermines trust in the system. That's why, at Olfeo, every site is verified by a human, even when it has been analyzed automatically. This level of rigor is essential to guarantee security while respecting business practices.
An example that illustrates this can be found in the fine-grained handling of subdomains. On open platforms such as blog hosts, forums, or personal pages, many publishers classify the main domain once and for all. In this case, we analyze each subdomain independently. Why? Because each one is often managed by a different individual, with different intentions—from professional blogs to phishing pages.
This level of accuracy is essential to limit false positives and false negatives that automatic classification could generate. And it is made possible because our database is designed to be reliable, maneuverable, and interpretable. Not just massive.
3 – How do you define the classification criteria?
Our methodology is based on a structured risk scale. We analyze a website's content from several angles: legality, morality, and potential impact on user safety. When a website has mixed content—some of it informative and some of it questionable— we assign it to the category with the highest risk. This ensures a cautious filtering policy with no gray areas, while providing the best possible protection for users.
We also have an active reclassification system: websites change, usage patterns evolve, and our database must remain synchronized. We respond quickly to reclassification requests, whether they come from customers, partner publishers, or our own monitoring tools.
We do not seek to rank all existing websites: 30 to 40% of the web consists of pages with little or no value and low traffic. We focus our efforts where they have the greatest impact. And the result we have achieved, with a recognition rate of 99.7%, proves us right.
4 – What are the real quality criteria for evaluating a filter database?
Very often, we see comparisons that display the volume of ranked sites as the sole criterion. But this figure is misleading. Here are the real criteria to consider when evaluating a web classification database:
- Recognition rate: Does the database effectively cover the sites actually visited by users?
- Accurate categorization rate: Are sites correctly classified, even in complex cases?
- Granularity: Are subdomains of blog platforms and hosting domains treated individually?
- Security policy: when in doubt, does security take precedence?
- Responsiveness of support: can we report an error and get it fixed quickly?
- Cultural and geographic adaptation: does the database take into account specific characteristics and local legislation?
A good foundation is not static. It must be dynamic, evolving, and interpretable. It must also be part of a broader approach, capable of flagging anomalies and justifying its decisions.
5 – Why is this level of requirement so crucial today?
Because today, the digital attack surface is constantly expanding. Threats increasingly come via the web—whether through malicious pages, phishing sites, disinformation campaigns, or simply inappropriate content.
Our database is used in critical contexts: schools, businesses, government agencies, hospitals. A false negative is a dangerous site that slips through. A false positive is a legitimate resource that is blocked unnecessarily, wasting time for the IT department, which has to deal with the unhappy user. In both cases, this undermines the effectiveness of the security policy.
And beyond the technical aspects, there is a question of trust. Our customers rely on us to make automated, sensitive, and sometimes invisible decisions. This trust is built through transparency, rigor, and the quality of our rankings.
6 – Any final thoughts?
If I had to summarize:
👉 It's not the size of the base that determines its value, it's its quality.
👉 It is notautomation alone, butthe interaction between AI and human expertise that guarantees relevance.
👉 It is not the volume of sites ranked, but the accuracy of the decisions that counts.
At Olfeo, our mission is to build a foundation that serves as a trusted tool for our customers, a solid technical pillar, and a faithful reflection of the reality of the web.


