« CVS使用手册 | (回到Blog入口)|(回到首页) | C Java PHP Perl Python的程序代码美化工具使用 »

WebAlizer的配置说明(含Windows下的for IIS版)


WebAlizer是一个开源的高性能日志分析程序,以下是配置文件的摘要翻译。

Webalizer配置文件说明:重要的地方做了翻译并附有一些重要的配置修改

#
# Webalizer 样例配置文件
# Copyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net)
# 翻译: 车东
#
# Distributed under the GNU General Public License.  See the
# files "Copyright" and "COPYING" provided with the webalizer
# distribution for additional information.
#
# 这是一个Webalizer (版本 2.01)的配置文件样例
# 所有以'#'开始的行都是被程序忽略的注释,此外空白行也会被跳过,其他行都是具体的配置选项。
# 并按照"ConfigOption  Value"的格式,ConfigOption是合法的配置选项关键词,而Value是相应选项对应的值
# 非法的键/值会被忽略并会有相应的警告提示。关键词和值之间至少需要一个空格或制表符tab分割
#
# 从0.98版本开始,Webalizer会找缺省在当前目录下找一个名为webalizer.conf缺省配置文件
# 如果没有找到,会使用/etc/webalizer.conf


# LogFile 定义了WEB服务的日志文件,如果这里没有定义,并且命令行参数也没有指定文件名,
# 则将STDIN(系统标准输入)作为输入数据源
# 如果日志文件扩展名为'.gz' (是一个gzip压缩文件),程序会一边读取一边进行解压缩。

LogFile        /home/apache/log/access_log_yesterday

# LogType 定义了日志的类型,Webalizer一般用于CLF和Combined格式的WEB服务日志格式
指定这个选项,你可以处理FTP日志(比如wu-ftp生成的xferlog,和Squid自己的日志
值可以是:'clf', 'ftp' 或'squid', 缺省是'clf'
# JNH : 新的'iis'是为IIS设计的,IIS4缺省使用标准日志格式,IIS5缺省使用W3C格式
# webalizer会自动根据日志的文件名进行识别:标准格式的日志文件名以I开头,W3C的是E
# 你可以在一个目录下同时存放2种日志,webalizer会全部读取并生成一份报告

LogType    iis

# OutputDir 报告的输出目录地址,必须是完整的全路径名,但相对路径也许也行,
# 如果没有指定,输出目录就是当前目录。

OutputDir      /home/apache/htdocs/usage/

# HistoryName 允许你设置webalizer生成的历史数据文件名
# 历史数据文件保存了12个月内的数据,这些数据会用来生成首页的HTML页面index.html
# 缺省文件名是:"webalizer.hist",缺省存放在指定的输出目录中, 也可以使用绝对路径指定到其他目录中。

#HistoryName    webalizer.hist

# Incremental 增量处理允许你处理被分隔成多个小文件的大日志,对于大型站点的按周,按天的日志轮循会非常有用
# 为了继续上次的处理,Webalizer在退出前会保存当时处理的数据并在下次运行是恢复当时的状态
在这个模式下,Webalizer会扫描并忽略重复的记录,请看README文件,里面有更详细的解说
值可以是:'yes'或'no'缺省为'no'.
# 'webalizer.current'这个文件用来保存当前数据,位置在OutputDir设置的输出目录中
# 启用这个选项前,请至少阅读一下README文件中的增量处理一节

Incremental    yes

# IncrementalName 允许你设置保存当前数据的文件名,和HistoryName选项一样,除非设置绝对路径,否则文件就在缺省输出目录中,
# 这个选项只有在启用了Incremental模式后才有意义

#IncrementalName    webalizer.current

# ReportTitle是标题文字,除非这个字符串是空的,否则主机名会空一格后显示在后面,
# 缺省是英文:"Usage Statistics for".

#ReportTitle    Usage Statistics for

# HostName 定义了报告对应的主机名,用在报告的标题和URL统计里,这样
# 即使在一个虚拟主机的统计中,点击URL统计的链接也可以转向相应的正确地址。
# 或者生成报告的服务器是在另外一台机器,clicking on URL's in the report to go to the proper location in
# the event you are running the report on a 'virtual' web server,
# or for a server different than the one the report resides on.
# 如果这里没有指定webalizer会尝试调用uname命令获得系统的主机名,如果失败缺省为"localhost"

HostName       www.chedong.com

# HTMLExtension 允许你设置生成报告的文件扩展名,一般缺省是"html",但你也可以根据站点改成你需要的名字
(像配置PHP一样 embeded pages)?

#HTMLExtension  html

# PageType 你告诉Webalizer那种类型的URL是你定义的'页面访问'(Page View).  大部分人认为一个html或cgi请求文档是页面,
# 而嵌入在页面中的图片和声音不算,如果没有指定,如果是WEB日志统计,页面的扩展名就是'htm*'和'cgi',
# 如果是ftp日志,扩展名就是'txt' 对于Servlet这样没有扩展名的请求Webalizer也是算页面的。

PageType    htm*
PageType    cgi
PageType    asp
PageType    p*
#PageType    phtml
#PageType    php3
#PageType    pl

# UseHTTPS 如果分析的站点使用安全服务器,URL的链接将是以'https://'开头,而不是缺省的'http://'.
如果需要,把它设置成'yes'。缺省是'no'.  这个配置只影响'Top URL's'里的链接.

#UseHTTPS       no

# DNSCache 指定了用于反相DNS解析的DNS缓存文件,如果你希望对所有日志中所有的IP地址进行反相域名解析
# addresses found in the log file.  如果没有指定绝对路径(文件名不是以'/'开头),这个文件缺省就在输出目录下
更多详细说明请参考DNS.README
# JNH : 如果你使用ListServer选项,你必须指定DnsCache的全路径

#DNSCache    dns_cache.db

# DNSChildren 允许你设置用多少个"子"进程进行DNS解析和更新DNS缓存文件。
# 如果指定了数字,Webalizer会创建DNS缓存文件并且每次运行都会更新,DNS解析会在
日志分析之前根据指定的数值调起子进程进行。如果使用DNS解析,DNS缓存文件名也必须指定。
# DNS lookups.  If used, the DNS cache filename MUST be specified as
# well.  缺省值是0,等于禁用DNS缓存文件,子进程的个数可以是用1 到100之间,如果更大会影响系统运行。
比较合理的值是5到20之间,更多详细信息请参考DNS.README

#DNSChildren    0

# HTMLPre 定义了输出页面中最开头的HTML代码,缺省是以下的DOCTYPE声明
# 每行最长是80个字符,如果需要更多代码可以使用多条配置。

#HTMLPre <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

# HTMLHead 定义了插入到<HEAD></HEAD>中间,紧接在<TITLE>行后的HTML代码
# 每行最长是80个字符,如果需要更多代码可以使用多条配置。

#HTMLHead <META NAME="author" CONTENT="The Webalizer">

# HTMLBody 定义了第一行<BODY>标签的HTML代码,缺省如下:
# 每行最长是80个字符,如果需要更多代码可以使用多条配置。


#HTMLBody <BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">

# HTMLPost 定义了输出页面中紧跟在第个<HR>标签后面紧跟在标题
# 和"summary period"-"Generated on:"这几行后面的代码。
# As with HTMLHead, you can define as many of these as you want and
# they will be inserted in the output stream in order of apperance.
# 每行最长是80个字符,如果需要更多代码可以使用多条配置。

#HTMLPost     <BR CLEAR="all">

# HTMLTail defines the HTML code to insert at the bottom of each
# HTML document, usually to include a link back to your home
# page or insert a small graphic.  It is inserted as a table
# data element (ie: <TD> your code here </TD>) and is right
# alligned with the page.  Max string size is 80 characters.

#HTMLTail <IMG SRC="msfree.png" ALT="100% Micro$oft free!">

# HTMLEnd defines the HTML code to add at the very end of the
# generated files.  It defaults to what is shown below.  If
# used, you MUST specify the </BODY> and </HTML> closing tags
# as the last lines.  Max string length is 80 characters.

#HTMLEnd </BODY></HTML>

# The Quiet option suppresses output messages... Useful when run
# as a cron job to prevent bogus e-mails.  Values can be either
# "yes" or "no".  Default is "no".  Note: this does not suppress
# warnings and errors (which are printed to stderr).

#Quiet        no

# ReallyQuiet will supress all messages including errors and
# warnings.  Values can be 'yes' or 'no' with 'no' being the
# default.  If 'yes' is used here, it cannot be overriden from
# the command line, so use with caution.  A value of 'no' has
# no effect.

#ReallyQuiet    no

# TimeMe allows you to force the display of timing information
# at the end of processing.  A value of 'yes' will force the
# timing information to be displayed.  A value of 'no' has no
# effect.

#TimeMe        no

# GMTTime allows reports to show GMT (UTC) time instead of local
# time.  Default is to display the time the report was generated
# in the timezone of the local machine, such as EDT or PST.  This
# keyword allows you to have times displayed in UTC instead.  Use
# only if you really have a good reason, since it will probably
# screw up the reporting periods by however many hours your local
# time zone is off of GMT.

#GMTTime        no

# Debug prints additional information for error messages.  This
# will cause webalizer to dump bad records/fields instead of just
# telling you it found a bad one.   As usual, the value can be
# either "yes" or "no".  The default is "no".  It shouldn't be
# needed unless you start getting a lot of Warning or Error
# messages and want to see why.  (Note: warning and error messages
# are printed to stderr, not stdout like normal messages).

#Debug        no

# FoldSeqErr forces the Webalizer to ignore sequence errors.
# This is useful for Netscape and other web servers that cache
# the writing of log records and do not guarentee that they
# will be in chronological order.  The use of the FoldSeqErr
# option will cause out of sequence log records to be treated
# as if they had the same time stamp as the last valid record.
# Default is to ignore out of sequence log records.

#FoldSeqErr    no

# VisitTimeout 用来定义一个访客回话的超时时间,缺省为30分钟。
# Visits是根据访客发出请求的时间和来自这个访客所在站点(IP)的最后访问时间决定的,
# 如果2者时间间隔超过VisitTimeout的值,这个请求就被认为是一个新的访客,访客数也被加1
# 值为超时的秒数(缺省为=1800秒=30分钟)

#VisitTimeout    1800

# IgnoreHist shouldn't be used in a config file, but it is here
# just because it might be usefull in certain situations.  If the
# history file is ignored, the main "index.html" file will only
# report on the current log files contents.  Usefull only when you
# want to reproduce the reports from scratch.  USE WITH CAUTION!
# Valid values are "yes" or "no".  Default is "no".

#IgnoreHist    no

# Country Graph allows the usage by country graph to be disabled.
# Values can be 'yes' or 'no', default is 'yes'.

#CountryGraph    yes

# DailyGraph and DailyStats allows the daily statistics graph
# and statistics table to be disabled (not displayed).  Values
# may be "yes" or "no". Default is "yes".

#DailyGraph    yes
#DailyStats    yes

# HourlyGraph and HourlyStats allows the hourly statistics graph
# and statistics table to be disabled (not displayed).  Values
# may be "yes" or "no". Default is "yes".

#HourlyGraph    yes
#HourlyStats    yes

# GraphLegend allows the color coded legends to be turned on or off
# in the graphs.  The default is for them to be displayed.  This only
# toggles the color coded legends, the other legends are not changed.
# If you think they are hideous and ugly, say 'no' here :)

#GraphLegend    yes

# GraphLines allows you to have index lines drawn behind the graphs.
# I personally am not crazy about them, but a lot of people requested
# them and they weren't a big deal to add.  The number represents the
# number of lines you want displayed.  Default is 2, you can disable
# the lines by using a value of zero ('0').  [max is 20]
# Note, due to rounding errors, some values don't work quite right.
# The lower the better, with 1,2,3,4,6 and 10 producing nice results.

#GraphLines    2

# The "Top" options below define the number of entries for each table.
# Defaults are Sites=30, URL's=30, Referrers=30 and Agents=15, and
# Countries=30. TopKSites and TopKURLs (by KByte tables) both default
# to 10, as do the top entry/exit tables (TopEntry/TopExit).  The top
# search strings and usernames default to 20.  Tables may be disabled
# by using zero (0) for the value.

#TopSites        30
#TopKSites       10
#TopURLs         30
#TopKURLs        10
#TopReferrers    30
#TopAgents       15
#TopCountries    30
#TopEntry        10
#TopExit         10
#TopSearch       20
#TopUsers        20

# All* 关键词允许显示所有的URL,独立站点(IP),引用链接(Referrers)
# 用户浏览器, 搜索关键词和用户名,如果启用,会生成另外一个HTML页面并有链接
# 加在相应栏目的下面,注意以下2点,这些统计必然比TOP统计要大的多,第2,这些对外都是可见的
# 值可以是yes或no,缺省都是no,对于一个公开发布的站点,这些按月生成的统计
# 会非常大。会需要很多磁盘空间,如果访问很多也会带来很多流量。
 
#AllSites    no
AllURLs            yes
#AllReferrers    no
#AllAgents    no
AllSearchStr    yes
#AllUsers       no

# The Webalizer normally strips the string 'index.' off the end of
# URL's in order to consolidate URL totals.  For example, the URL
# /somedir/index.html is turned into /somedir/ which is really the
# same URL.  This option allows you to specify additional strings
# to treat in the same way.  You don't need to specify 'index.' as
# it is always scanned for by The Webalizer, this option is just to
# specify _additional_ strings if needed.  If you don't need any,
# don't specify any as each string will be scanned for in EVERY
# log record... A bunch f them will degrade performance.  Also,
# the string is scanned for anywhere in the URL, so a string of
# 'home' would turn the URL /somedir/homepages/brad/home.html into
# just /somedir/ which is probably not what was intended.

#IndexAlias     home.htm
#IndexAlias    homepage.htm

# The Hide*, Group* and Ignore* and Include* keywords allow you to
# change the way Sites, URL's, Referrers, User Agents and Usernames
# are manipulated.  The Ignore* keywords will cause The Webalizer to
# completely ignore records as if they didn't exist (and thus not
# counted in the main site totals).  The Hide* keywords will prevent
# things from being displayed in the 'Top' tables, but will still be
# counted in the main totals.  The Group* keywords allow grouping
# similar objects as if they were one.  Grouped records are displayed
# in the 'Top' tables and can optionally be displayed in BOLD and/or
# shaded. Groups cannot be hidden, and are not counted in the main
# totals. The Group* options do not, by default, hide all the items
# that it matches.  If you want to hide the records that match (so just
# the grouping record is displayed), follow with an identical Hide*
# keyword with the same value.  (see example below)  In addition,
# Group* keywords may have an optional label which will be displayed
# instead of the keywords value.  The label should be seperated from
# the value by at least one 'white-space' character, such as a space
# or tab.
#
# The value can have either a leading or trailing '*' wildcard
# character.  If no wildcard is found, a match can occur anywhere
# in the string. Given a string "www.yourmama.com", the values "your",
# "*mama.com" and "www.your*" will all match.
# Your own site should be hidden

#HideSite    *mrunix.net
#HideSite    localhost

# Your own site gives most referrals
#HideReferrer    mrunix.net/

# This one hides non-referrers ("-" Direct requests)
#HideReferrer    Direct Request

# Usually you want to hide these
HideURL        *.gif
HideURL        *.GIF
HideURL        *.jpg
HideURL        *.JPG
HideURL        *.png
HideURL        *.PNG
HideURL        *.ra
HideURL         *.css

# Hiding agents is kind of futile
#HideAgent    RealPlayer

# You can also hide based on authenticated username
#HideUser    root
#HideUser    admin

# Grouping options
#GroupURL    /cgi-bin/*    CGI Scripts
#GroupURL    /images/*    Images
#GroupSite    *.aol.com
#GroupSite    *.compuserve.com
#GroupReferrer    yahoo.com/    Yahoo!
#GroupReferrer    excite.com/     Excite
#GroupReferrer    infoseek.com/   InfoSeek
#GroupReferrer    webcrawler.com/ WebCrawler

#GroupUser      root            Admin users
#GroupUser      admin           Admin users
#GroupUser      wheel           Admin users

# The following is a great way to get an overall total
# for browsers, and not display all the detail records.
# (You should use MangleAgent to refine further...)

#GroupAgent    MSIE        Micro$oft Internet Exploder
#HideAgent    MSIE
#GroupAgent    Mozilla        Netscape
#HideAgent    Mozilla
#GroupAgent    Lynx*        Lynx
#HideAgent    Lynx*

# HideAllSites allows forcing individual sites to be hidden in the
# report.  This is particularly useful when used in conjunction
# with the "GroupDomain" feature, but could be useful in other
# situations as well, such as when you only want to display grouped
# sites (with the GroupSite keywords...).  The value for this
# keyword can be either 'yes' or 'no', with 'no' the default,
# allowing individual sites to be displayed.

#HideAllSites    no

# The GroupDomains keyword allows you to group individual hostnames
# into their respective domains.  The value specifies the level of
# grouping to perform, and can be thought of as 'the number of dots'
# that will be displayed.  For example, if a visiting host is named
# cust1.tnt.mia.uu.net, a domain grouping of 1 will result in just
# "uu.net" being displayed, while a 2 will result in "mia.uu.net".
# The default value of zero disable this feature.  Domains will only
# be grouped if they do not match any existing "GroupSite" records,
# which allows overriding this feature with your own if desired.

#GroupDomains    0

# The GroupShading allows grouped rows to be shaded in the report.
# Useful if you have lots of groups and individual records that
# intermingle in the report, and you want to diferentiate the group
# records a little more.  Value can be 'yes' or 'no', with 'yes'
# being the default.

#GroupShading    yes

# GroupHighlight allows the group record to be displayed in BOLD.
# Can be either 'yes' or 'no' with the default 'yes'.

#GroupHighlight    yes

# The Ignore* keywords allow you to completely ignore log records based
# on hostname, URL, user agent, referrer or username.  I hessitated in
# adding these, since the Webalizer was designed to generate _accurate_
# statistics about a web servers performance.  By choosing to ignore
# records, the accuracy of reports become skewed, negating why I wrote
# this program in the first place.  However, due to popular demand, here
# they are.  Use the same as the Hide* keywords, where the value can have
# a leading or trailing wildcard '*'.  Use at your own risk ;)

#IgnoreSite    bad.site.net
#IgnoreURL    /test*
#IgnoreReferrer    file:/*
#IgnoreAgent    RealPlayer
#IgnoreUser     root

# The Include* keywords allow you to force the inclusion of log records
# based on hostname, URL, user agent, referrer or username.  They take
# precidence over the Ignore* keywords.  Note: Using Ignore/Include
# combinations to selectivly process parts of a web site is _extremely
# inefficent_!!! Avoid doing so if possible (ie: grep the records to a
# seperate file if you really want that kind of report).

# Example: Only show stats on Joe User's pages...
#IgnoreURL    *
#IncludeURL    ~joeuser*

# Or based on an authenticated username
#IgnoreUser     *
#IncludeUser    someuser

# The MangleAgents allows you to specify how much, if any, The Webalizer
# should mangle user agent names.  This allows several levels of detail
# to be produced when reporting user agent statistics.  There are six
# levels that can be specified, which define different levels of detail
# supression.  Level 5 shows only the browser name (MSIE or Mozilla)
# and the major version number.  Level 4 adds the minor version number
# (single decimal place).  Level 3 displays the minor version to two
# decimal places.  Level 2 will add any sub-level designation (such
# as Mozilla/3.01Gold or MSIE 3.0b).  Level 1 will attempt to also add
# the system type if it is specified.  The default Level 0 displays the
# full user agent field without modification and produces the greatest
# amount of detail.  User agent names that can't be mangled will be
# left unmodified.

#MangleAgents    0

# 搜索引擎关键词允许你设置搜索引擎和URL中的查询格式,用于统计用户通过那些关键词
# 被用来找到你的站点。第1个关键词是从WEB日志中的referrer字段识别搜索引擎,第2个是
# URL中的关键词的参数名。

SearchEngine    yahoo.com    p=
SearchEngine    altavista.com    q=
SearchEngine    google.com    q=
SearchEngine    eureka.com    q=
SearchEngine    lycos.com    query=
SearchEngine    hotbot.com    MT=
SearchEngine    msn.com        MT=
SearchEngine    infoseek.com    qt=
SearchEngine    webcrawler    searchText=
SearchEngine    excite        search=
SearchEngine    netscape.com    search=
SearchEngine    mamma.com    query=
SearchEngine    alltheweb.com    query=
SearchEngine    northernlight.com  qr=
SearchEngine    baidu.com   word=
SearchEngine    sina.com.cn word=
SearchEngine    sohu.com    word=
SearchEngine    163.com q=



# Dump* 用来将统计导出成用制表符(TAB)分割的文本文件,从而方便导入到其他应用中做统计。
# 比如数据库和统计软件

# DumpPath specifies the path to dump the files.  If not specified,
# it will default to the current output directory.  Do not use a
# trailing slash ('/').

#DumpPath    /var/lib/httpd/logs

# The DumpHeader keyword specifies if a header record should be
# written to the file.  A header record is the first record of the
# file, and contains the labels for each field written.  Normally,
# files that are intended to be imported into a database system
# will not need a header record, while spreadsheets usually do.
# Value can be either 'yes' or 'no', with 'no' being the default.

#DumpHeader    no

# DumpExtension allow you to specify the dump filename extension
# to use.  The default is "tab", but some programs are pickey about
# the filenames they use, so you may change it here (for example,
# some people may prefer to use "csv").

#DumpExtension    tab

# 控制各个大类统计的导出。
# 值可以是'yes'或 'no'缺省为'no'.

#DumpSites    no
DumpURLs    yes
DumpReferrers    yes
#DumpAgents    no
#DumpUsers    no
DumpSearchStr   yes

# End of configuration file...  Have a nice day!

# begin of JNH mofications
# new entry for Win32 release

# NOUVELLE ENTREE pour les serveurs NT

# nom de la page par defaut sur le serveur
# replace file "Index" for unix systems by other name

# IndexPage default

# 所有的日志存放目录
# 文件个数限制为一个目录下250,如果需要处理更多你需要移动文件并再次运行。

# FolderLog       C:\JnhDev\WebAlizer32\Exemple de Logs\IIS4.0\Log Standard\
FolderLog C:\WINNT\system32\LogFiles\W3SVC3\
ExtentionLog log

# when you use mix type of log in same folder, webalizer sort file for order by
# name, but if begin of file file is mix sort didn't make work, then you can disable it
# default is no

# DisableSort yes


# Name of file contain list of server to process like for each line :
# Name of Customer<SPACE>Folder of log<SPACE>Folder output<SPACE>Host Name1;Host Name 2
# sample (extract of production file, who have 255 lines)
# all of option in this file apply to all reports ...
# New in this file you can use coma (") for delimit field
# wA001 c:\WA001\LogIIS\ c:\wA001\stats wa001.LeRelaisInternet.com;www1.jeanlouisaubert.com
# wA002 c:\WA002\LogIIS\ c:\wA002\stats wa002.LeRelaisInternet.com;www.restotel.fr;www.nordpage.fr
# wA003 c:\WA003\LogIIS\ c:\wA003\stats Wa003.LeRelaisInternet.com;www.autobusavapeur.com

#ServerList c:\jnhdev\webalizer\listeserv.txt

# If you have dayly rotation on log name, you can change name after process a file
# to have less no productive work day
# to use this option you need to use "HistoryName" and "Incremental"

RenameLog yes
NewExtension sav


# 2 New Options for optimize DNS resolution : is time to live in data base cache
# for good dns resolution (default is 30 days) and for bad resolution, like
# no reverse IP, in this case it's better to store errors in database file
# cause each day bad dns consume a lot of time (default 7 days)

#TtlDns         30
#TtlDnsError    7

# new option for convert each record date to Local time before process it ...
# Test only
# default = No

ConvertTime yes


# end of JNH .. HAve a nice day !!!

注意:对IIS日志需要通过配置将发送字节数sc_size和referer2个字段启用。

发表一个评论

(如果你此前从未在此 Blog 上发表过评论,则你的评论必须在 Blog 主人验证后才能显示,请你耐心等候。)

相关文章

关于

此页面包含了发表于March 13, 2002 10:03 PM的 Blog 上的单篇日记。

此 Blog 的前一篇日记是 CVS使用手册

此 Blog 的后一篇日记是 C Java PHP Perl Python的程序代码美化工具使用

更多信息可在 主索引 页和 归档 页看到。

Creative Commons License
此 Blog 中的日记遵循以下授权 Creative Commons(创作共用)授权.
Powered by
Movable Type 3.36