Safe Browsing at Scale: Checking Every URL Twice
How we integrate Google Safe Browsing v4 to check every URL at creation time and recheck all active URLs daily — with auto-deactivation and email notifications.
Every URL shortened through hrva.cc is checked against Google Safe Browsing before it's saved. And then every active URL is rechecked every night. Here's how the pipeline works.
Why Two Checks
A single check at creation time is not enough. A URL that is safe today can be compromised tomorrow — a legitimate site gets hacked, a benign page gets replaced with phishing content. The daily recheck catches these cases. Between the two, no link stays live longer than 24 hours after being flagged.
Creation-Time Check
When a user submits a URL to shorten, the DefaultUrlValidator calls
safeBrowsingService.checkUrlsForThreats() before the URL is saved. If Google
Safe Browsing returns any threats, the request is rejected with a validation error. The link
is never created.
The SafeBrowsing API client is a singleton that creates batch search requests using Google's API v4. It gracefully degrades: if the API is unreachable, it returns an empty threat list — URLs are not blocked due to an API outage. Safety shouldn't become a denial-of-service vector.
Daily Recheck
Every night at 2 AM, a scheduled task runs SafeBrowsingRecheckService.recheckActiveUrls().
It queries all active URLs from the database, batches them into pages of 20, and checks each
one against Safe Browsing. If a URL is flagged:
- The URL is immediately deactivated (
active = false) - The cache entry is evicted so no one gets redirected to a dangerous page
- The owner receives an email notification with the threat type detected
The Email Template
The malware notification email tells the owner which short URL was affected, the destination URL, the threat type detected, and includes a link to the dashboard where they can review or delete the link. The email is styled with the same dark theme as the app, matching the hrva.cc brand.
Graceful Degradation
If the Safe Browsing API is down during the daily recheck, the task catches the exception, logs a warning, and tries again the next night. A single failed recheck doesn't cascade into data loss or incorrect deactivations.