Hide
--- TEST SYSTEM --- TEST SYSTEM --- TEST SYSTEM ---
Hide
Web access statistics
hide
Hide
A simple overview of the access load can be seen by examining the number of accesses to the home page per day. These are shown in the tables below. The monthly links give a more detailed overview of who has been looking at the pages, and which are most popular. The numbers after the name of the month are the number of requests for the home page. This doesn't give much useful information and the graphs at the bottom of this page give a better idea of trends. So these numbers are no longer listed.
Data transfer rate
The graph of total data transfer rate from our main site, shows an overall picture of usage. Figures for some months in the early years of the service have unfortunately been lost.
The peak in Aug 2015 seems to have been problems with Drupal conversion on 14 Aug.
Server requests
The server requests graph shown the average number of successful requests per day. It does not include redirects or failures such as 404s. Requests include pages of html, images, css files, javascript etc.
The peak in April 2020 was when we transferred to Drupal 8 when there was significantly more robot activity and for a time we were not using consolidated css and javascript files.
Notes
These statistics are based on the web server access logs produced on our main server. Until our move to Drupal in 2015 most of the service was provided by a computer at Manchester University whilst a few counties were hosted elsewhere. So there was an increase in accesses due to the fact that everything was then all based at the same site. The migration into drupal did take some time to achieve and did not take place all at once.
The format of the data used to produce the statistics has also changed over time as has the analysis software used to produce the reports and at times this did not happen. They do though give an idea of trends as time has passed.
Since 1999 we have used the Apache web server and used a program called Analog to analyse the logs. Both have changed over the years and we have also had different ideas of what reports to produce. All are based on the data format of the Apache logs which contain:
- The IP address of the browser requesting the page
- The url of that page
- The status code e.g. 200, 301, 404 etc.
- The size of the page sent to the user
- Information provided by the browser which can include
- The url of the page containing the link to this one to this one.
- The name of the browser or the robot fetching the page
- The operating system it is running on
All these items apart from the last one seem to hold reliable data. Over time though the last one has changed according to the browser/robot used to fetch the page. Nowadays it seems to contain more than one browser as robots try and make it hard for servers to block them. It also seems to be a trend that the browsers mention all the operating systems on which they can run. So analysis and reports based on this data field can now only show overall trends depending on how complicated the Analog configuration are. The reports based on the the main data items though can give an accurate view of what is going on and the two graphs above are based upon those figures.
There are peaks and troughs in the graphs of course, some where we have made changes with unexpected consequences. In the earlier years there tended be be less activity in December with a big increase in January slowing down to a trough for the summer holidays.
Robots
It is not just real people sat with a web browser fetching our pages as there are also many robots fetching them as well. These consist of search engines fetching copies of our pages to use to build their indexes, and also software checking links to us in their web sites.The search engines tend to put a much heavier load on us than other robots, but it is simpler to just call both types a robot. It is very useful in any analysis of access data to be able to separate the real users from the robots as whilst we want users to find us in the search engines, we need to try and manage robot access so that it does not add such a load to the system that users experience poor performance. This was one of the factors in periods of poor performance after the move to Drupal 8 in April 2020.
Now the only way we have to help determine that an access is from a robot is that last unreliable field in the Apache logs. For many years we didn't try much to identify the robots so the statistics tend to have them just amongst the user data. The names of the robots have also changed over time and more keep appearing. The robot activity appears in The Operating system report under Known robots. There will be some further activity under the other operating systems for those with nothing to identify them in the log files.
Analog
We have some documentation about configuring Analog.