Web servers often experience unexpected traffic spikes, some of which are caused by aggressive bots or crawlers rather than legitimate users. While search engines like Google or Bing play a positive role in indexing your site, many bots, including scrapers, SEO crawlers, and malicious scanners, can overload servers, consume bandwidth, and even lead to downtime.

Monitoring bot activity is essential to maintain server performance and security. A few targeted commands can quickly reveal which bots are interacting with your site and how often, allowing you to take corrective action.

Step 1: Locate Your Access Logs

Access logs record every request to your server, including IP addresses, user agents, request times, and requested URLs. The location of these logs depends on your server setup:

  • Apache: /var/log/apache2/access.log
  • Nginx: /var/log/nginx/access.log
  • cPanel / DirectAdmin: /home/username/access-logs/

These logs are the primary source for identifying both legitimate and suspicious traffic.

Step 2: Filter Logs by Today’s Date

  • Focusing on today’s traffic makes your analysis faster and more relevant. Use the following command to extract only today’s entries from Nginx logs:

grep “$(date +”%d/%b/%Y”)” /var/log/nginx/access.log

  • This isolates the requests for the current day, providing a clear picture of immediate bot activity.

Step 3: Identify Bots by User Agent

  • The User Agent field in access logs indicates the client is a browser, bot, or crawler. To identify the top bots hitting your site, use:

awk -F\” ‘{print $6}’ /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -20

  • This command lists the top 20 User Agents, sorted by the number of requests. Example output might include:

12000 MJ12bot/v2.0.4

8800 AhrefsBot/7.0

5300 SemrushBot-BA

2100 Googlebot/2.1

900 Amazonbot/0.1

  • By analyzing this data, you can distinguish between benign crawlers and aggressive bots that might need restrictions.

Step 4: Identify Bot IPs

  • Some bots disguise themselves as browsers, making the User Agent field less reliable. Checking IP addresses helps uncover repeat offenders:

awk ‘{print $1}’ /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -20

  • This highlights the most frequent IPs, which could indicate automated scraping activity.

Step 5: Combine IP and User Agent for Better Clarity

  • To link specific IP addresses with their associated User Agents, execute:

awk -F\” ‘{print $1 ” ” $6}’ /var/log/nginx/access.log | awk ‘{print $1,$NF}’ | sort | uniq -c | sort -nr | head -20

  • This provides a clear view of which IPs are repeatedly accessing your site under the same bot identity.

Step 6: Spot Suspicious Bots

While analyzing logs, watch for:

  • SEO crawlers like AhrefsBot or SemrushBot which can overwhelm small servers.
  • Aggressive crawlers like MJ12bot may ignore crawl delay rules.
  • Unknown or empty User Agents, often indicative of malicious activity.

Step 7: Block or Rate Limit Bad Bots

Once suspicious bots are identified, there are several ways to mitigate their impact:

  • CSF Firewall:

csf -d <IP>

  • Nginx Blocking: Add rules in your site configuration:

if ($http_user_agent ~* (MJ12bot|AhrefsBot)) { return 403; }

  • Rate Limiting:

limit_req zone=botlimit burst=5 nodelay;

  • Robots.txt (for polite bots only):

User-agent: MJ12bot

Disallow: /

Note: Malicious bots often ignore robots.txt, so don’t rely solely on it.

Step 8: Automate Monitoring with Cron

  • For ongoing monitoring, automate daily bot reports using cron:

grep “$(date +”%d/%b/%Y”)” /var/log/nginx/access.log | awk -F\” ‘{print $6}’ | sort | uniq -c | sort -nr | head -10 > /root/bot_report.txt

  • You can also configure it to email the report automatically, ensuring continuous awareness of bot activity without manual intervention.

Tip: Only restrict harmful bots; allow Google, Bing, and other search engines to ensure SEO is not negatively impacted.

Monitoring bot traffic is essential for maintaining server performance and security. By leveraging AWK and standard access logs, administrators can efficiently identify bots, track repeat IPs, and implement effective blocking or rate limiting measures.

For businesses and web administrators, ServerAdminz offers expert solutions in server management, firewall configuration, and proactive monitoring. Our team ensures that servers remain stable, optimized, and secure, allowing organizations to focus on growth without worrying about bot related issues.

By combining simple tools like AWK with expert guidance from ServerAdminz, managing bot traffic becomes not only manageable but highly efficient.