Skip to main content
Traffic Quality & Fraud

GA4 Referral Spam: Clean Traffic Without Bad Filters

GA4 Referral Spam: Clean Traffic Without Bad Filters

On Monday, GA4 shows an unfamiliar referral source with 1,800 sessions. Engagement time is close to zero, nearly all traffic comes from one city outside the target market, and no orders were placed. The first reaction is often to add the domain to “Unwanted referrals.” If you stop there, you may only relabel contaminated data rather than remove it.

Referral spam, self-referrals, bots, and payment-provider referrals can look similar in acquisition reports. Correct treatment begins with identifying which one you have.

How GA4 records referrals

A referral session is one where GA4 determines that a visitor arrived through a link on another domain. The previous domain becomes the source and the medium is typically referral.

That is normal when a real site sends a visitor. It becomes problematic when:

  • the domain is a payment provider or part of your own journey;
  • your domains or subdomains refer to one another;
  • a bot or automated browser triggers tracking;
  • spam events are sent directly to a measurement endpoint;
  • missing UTM parameters cause campaign traffic to be misattributed.

One unfamiliar referral row is not proof of bot traffic.

Four situations that are easy to confuse

1. A legitimate unwanted referral

A customer leaves for a payment processor and returns to the thank-you page. GA4 may treat the processor as a new source and credit the conversion to it.

The traffic is real. Configure unwanted referrals or cross-domain measurement so the original acquisition source survives the journey.

2. A self-referral

Your own domain or subdomain appears as the source. Common causes include:

  • the Google tag is absent from part of the flow;
  • cookies are lost between domains;
  • cross-domain measurement is incomplete;
  • redirects remove the linker parameter;
  • consent behavior differs across pages.

Hiding the domain does not repair the measurement break. Trace the cookie and tag path.

3. A crawler or bot visiting the real site

The bot loads a page and fires tracking like a browser. Sessions, views, and even events may appear. Investigate IP type, user agent, behavior, velocity, and click quality.

4. Ghost spam

Events are sent to GA4 without opening the website. They may carry a foreign hostname, meaningless page path, or campaign promoting the spammer. Because the request never touches your web server, access logs may show nothing.

What “Unwanted referrals” does and does not do

GA4 marks matching events with ignore_referrer=true, preventing that referrer from becoming a new traffic source.

This is useful for payment processors, managed interaction domains, and some cross-domain flows. However:

  • events are still collected;
  • users and sessions do not disappear;
  • historical reports are not rewritten;
  • spam domains are not blocked from the website;
  • attribution may become direct or remain with an earlier source.

Adding a spam domain to the list is not a bot filter.

An investigation workflow

Step 1: Look beyond sessions

Add:

  • hostname;
  • landing page;
  • country and city;
  • device category;
  • browser;
  • engagement rate;
  • average engagement time;
  • conversions or key events;
  • first user source and session source.

A real source can convert poorly while still showing sensible landing pages, varied devices, and reading behavior. Spam usually produces a cluster of anomalies rather than one bad metric.

Step 2: Inspect the hostname

The hostname should be a domain you actually measure. Events attached to a foreign hostname or (not set) increase the likelihood that a measurement ID is being used elsewhere.

For multiple valid domains, maintain allowlist logic in reports or BigQuery rather than relying on memory.

Step 3: Compare server logs

Use the same time range, landing path, and geography. If GA4 reports thousands of sessions while the server has no corresponding traffic, ghost spam or direct event submission becomes more likely.

If logs show dense requests from datacenter IPs with repeated user agents, the bot probably visited the site.

Step 4: Check for missing campaign tags

An email, affiliate, or ad without UTMs can appear as an unfamiliar referral. Ask marketing whether the domain belongs to a partner, payment platform, or redirect service before excluding it.

Step 5: Find the start date

A referral spike following checkout, consent, domain, or tag changes often indicates implementation trouble. Spam may begin without a release, although timing is only a clue.

Treat the cause, not the row

Payment providers and journey domains

  1. Configure cross-domain measurement for domains you operate.
  2. Add the payment provider to unwanted referrals when appropriate.
  3. Test the full journey with DebugView.
  4. Confirm the original source still receives conversion credit.

Do not add domains blindly; you can turn valuable referrals into direct traffic.

Self-referrals

Check Google tag coverage, cookie domain, consent state, and linker parameters. Use Tag Assistant through the complete flow, not just the homepage.

A self-referral is evidence of a broken measurement path. An unwanted-referral rule may hide the symptom without reconnecting it.

Bots that visit the site

Standard GA4 reports do not expose IP addresses. Use outside evidence:

  • access or CDN logs;
  • residential/datacenter/VPN/Tor classification;
  • velocity by IP or fingerprint;
  • repeated user agents;
  • click behavior and event timing.

The IP Checker helps investigate one address. At scale, traffic scoring separates people from automation using multiple signals rather than one exclusion rule.

Ghost spam

There is no universal button that removes already collected data. Practical controls include:

  • reports accepting only valid hostnames;
  • segments that exclude known contaminated periods;
  • server-side endpoint protection for Measurement Protocol events;
  • tighter handling of public measurement identifiers where possible;
  • monitoring recurring campaign, hostname, and geography patterns.

Web measurement IDs are public by design, so reporting-layer allowlists are often more dependable than trying to keep them secret.

Cleaning analysis without rewriting history

GA4 configuration changes do not work retroactively. Record the change date and analyze two periods:

  • Before remediation: use a segment excluding confirmed sources or hostnames.
  • After remediation: verify whether the new configuration works.

In BigQuery or a warehouse, add a traffic_quality_status field rather than deleting rows:

  • valid;
  • internal;
  • known_bot;
  • suspected_bot;
  • ghost_spam;
  • attribution_issue.

Raw data lets you revise classification later without losing the audit trail.

A useful referral-spam alert board

Alert Example threshold
New source spike More than 3× the 7-day average
Hostname outside allowlist Any meaningful volume
Near-zero engagement Combined with a session spike
One city dominates a source Compared with target markets
Zero conversion Alongside abnormal session growth

Do not automatically delete traffic from one threshold. An alert opens an investigation; a decision needs several signals.

Frequent mistakes

Excluding every high-bounce referral

A real news article or directory may send readers who consume one page. Poor conversion does not prove fraud.

Writing broad regex rules

A rule matching “pay” could affect PayPal, payment subdomains, and unrelated valid sources. Normalize exact domains and test conditions before release.

Fixing GA4 while ignoring the server

A real bot still consumes bandwidth and may click ads after you hide it from a report. Analytics hygiene and traffic protection are separate layers.

Comparing filtered GA4 with raw logs

Logs include assets, crawlers, APIs, and failed requests. GA4 includes consent-dependent client events. Normalize both before comparing totals.

Start with GA4 vs Real Traffic to inspect tagging and understand likely gaps, then read Server Logs vs GA4 vs Cloudflare to choose the right evidence source.

One-session cleanup checklist

  1. Identify source/medium, hostname, and landing page.
  2. Compare engagement, geography, device, and conversion.
  3. Match against server or CDN logs.
  4. Ask marketing about payment, affiliates, and redirects.
  5. Classify attribution, self-referral, real bot, or ghost spam.
  6. Apply the appropriate control and record the date.
  7. Monitor for at least one business cycle.

Protect recurring reports from the next incident

After the immediate cleanup, move the logic out of one analyst's notebook. Maintain a small reference table of valid hostnames, owned domains, payment providers, known internal sources, and confirmed spam patterns. Give every entry an owner and review date.

Dashboards should expose both raw and cleaned views. Raw data preserves evidence; the cleaned view supports weekly decisions. Label the filters prominently so stakeholders do not compare a cleaned acquisition chart with an unfiltered executive total and assume tracking is broken.

Create an anomaly alert for new source/hostname pairs and sudden changes in session-to-engagement ratios. The alert should include landing pages and geography, giving the analyst enough context to investigate without opening five reports.

Document remediation dates in annotations or a data-quality log. If an unwanted-referral rule, consent release, or hostname filter changes on June 10, future analysts need that context when year-over-year charts shift.

Most importantly, rehearse removal. A false-positive filter can hide a real affiliate or campaign. Every rule should be reversible and tested on a copied exploration before it reaches the report used for budget decisions.

Conclusion

Referral spam is not one defect. Sometimes it is automation, sometimes fabricated events, and often a broken attribution setup. “Unwanted referrals” changes source attribution; it is not a trash can for sessions. Investigate hostnames, landing pages, logs, and behavior first, then choose cross-domain configuration, reporting filters, traffic-quality rules, or protection.

Reference: Google Analytics Help – Identify unwanted referrals.

Advertisement

Frequently asked questions

Does adding a domain to unwanted referrals remove spam sessions?
No. GA4 prevents that referrer from becoming a new source, but events and sessions remain and historical data is not rewritten.
How can I distinguish ghost spam from a real bot visit?
Compare hostname, landing path, and access logs. GA4 events without matching server requests make direct event submission more likely.
Is a self-referral always bot traffic?
No. It commonly indicates broken tag coverage, cookies, consent behavior, or cross-domain measurement between pages or domains.
#Bot Traffic #Quality Score #Search Console

Nhận bản tóm tắt SEO checklist qua email

Đăng ký để nhận bản tóm tắt các bước tối ưu SEO quan trọng nhất từ bài viết này.

Check your website for free

Run an SEO audit or check your traffic quality now — no signup required.