User Tools

Site Tools


web_stats:urchin

Urchin 6

Urchin will replace all other web statistics solutions (awstats, webalizer) as a centralized solutions. Other statistics will be deactivated at the end of 2008.

Installation (Linux)

Auto start the scheduler after every server restart:

cp /usr/local/urchin/util/urchin_daemons /etc/init.d/
chmod +x /etc/init.d/urchin_daemons
chkconfig -add urchin_daemons
service urchin_daemons start

Log Files

FTP logfile crawler

On Linux

FTP user for log download:

useradd -d /tmp/ -c "urchin log crawler" -s /sbin/nologin ftpurchin

Give read access to the httpd log directory:

chmod go+rx /var/log/httpd/
chmod go+r /var/log/httpd/*

IIS

The following metrics have to be logged on IIS:

  • Date [ date]
  • Time [ time ]
  • Client IP Address [ c-ip ]
  • User Name [ cs-username ]
  • Method [ cs-method ]
  • URI Stem [ cs-uri-stem ]
  • URI Query [ cs-uri-query ]
  • Protocol Status [ sc-status ]
  • Bytes Sent [ sc-bytes ]
  • User Agent [ cs[User-Agent] ]
  • Referer [ cs[Referer] ]
  • Cookie [ cs[Cookie] ] (This field only required for UTM tracking)

Wildcard for IIS log files (day is substituted by the previous day)

ex%y%m%d.log

Apache

If logrotate is activated process archived logfiles after the logrotate script runs on the web server. Usually the last archived file is named <logfile name>.1.gz:

access_log.1.gz

Commands & Utilities

command description
service urchin_daemons start start all urchin daemons
service urchin_daemons stop stop all urchin daemons
service urchin_daemons status display the status of urchin daemons

Urchin Installation Integrity Checker:

/usr/local/urchin/util/inspector

Reverse Proxy

The reveres proxy forwards all http requests to the Urchin web server (192.168.63.35:9999) Here's the Apache configuration:

<VirtualHost *:80 *:443>
    ServerName stats.example.com
    CustomLog logs/stats.example.com-access_log combined
    ErrorLog logs/stats.example.com-error_log
 
    # Redirect all requests to SSL
    RewriteEngine On
    RewriteCond %{HTTPS} off
    RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
 
    SSLEngine on
    SSLProtocol all -SSLv2
    SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
    SSLCertificateFile /etc/pki/tls/certs/stats.example.com.crt
    SSLCertificateKeyFile /etc/pki/tls/private/stats.example.com.key
 
    SetEnvIf User-Agent ".*MSIE.*" \
         nokeepalive ssl-unclean-shutdown \
         downgrade-1.0 force-response-1.0
 
 
    ProxyRequests Off
    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>
 
    ProxyPass / http://192.168.63.35:9999/
    ProxyPassReverse / http://192.168.63.35:9999/
    <Location />
        Order allow,deny
        Allow from all
    </Location>
</VirtualHost>

Task Scheduler History Watchdog

In order to know whether there is a problem during the generation of the statistics (i.e. no access to the logs), there is a special bash script which is scheduled by cron and checks for warnings in Urchin task table (on the MySQL database).

The MySQL user urchin6_watchdog has only read access to the table uprofiles_tasks.

Cron

### run the Urchin Scheduler watchdog
0 7 * * * /bin/bash /usr/local/urchin/util/warnings-watchdog.sh > /dev/null 2>&1

Script

#!/bin/bash
 
## Urchin Scheduler Watchdog, (c) February 2009, Nik Wolfgramm
## Connects to the Urchin MySQL database and checks for entries with warning state in the task scheduler history
## Informs admins per email if there is something frong with the Urchin Scheduler
 
user='urchin6_watchdog'
db='db_urchin6'
host='localhost'
pass='secret'
 
date=`date`
mail_subj='urchin scheduler warnings'
mail_to='admin@example.com'
 
# connect to the database end check the task scheduler for warnings in the past 24h
warnings=`mysql -h $host -u $user -p$pass -D $db -s -e 'SELECT * FROM uprofiles_tasks WHERE uipt_status = 2 AND utpt_mtime > DATE_SUB(CURDATE(), INTERVAL 1 DAY ); SELECT FOUND_ROWS();' 2> /dev/null | tail -n 1`
 
# check if $warnings is numeric
if [ $warnings -eq 0 ]; then
   echo "no warnings in urchin taks scheduler history within last 24 hours"
else
   echo "there are $warnings warning(s) in the urchin taks scheduler history within last 24 hours!"
mail -s $mail_subj $mail_to << END
----------------------------------------------------------
                Urchin Scheduler Watchdog
                $date
----------------------------------------------------------
 
Some Urchin scheduled taks finished with warnings:
$warnings warning(s) in the urchin taks scheduler history within last 24 hours!
Please check the scheduler task history under https://stats.icc.example.com.
END
fi

Urchin Monitoring

The urchin slave scheduler seems to go down from time to time, and nobody notices. This leads to a time consuming manual restore process of missed logfiles.

In order to avoid this, check with monit whether it is running

This setup will send out a mail when the slave scheduler was not running, and will then try to restart it. It will write a mail in case of a successful restart or in case of problems.

/srv/wiki.niwos.com/data/pages/web_stats/urchin.txt · Last modified: 2009/08/15 12:14 (external edit)