This process describes how geographic regions were added to the host-linkage dataset provided by JISC UK Web Domain Dataset1. The data is from 2000-2010. The process begins combining the host-host links to a file containing unique postcodes for each host ending in co.uk. An example is below.
year | origin | destination/links |
---|---|---|
2000 | btclickbus.excite.co.uk | greenwich2000.co.uk 1 |
2000 | btclickfree.excite.co.uk | www.rockvillecenter.com 2 |
2000 | adapthorpe.com | www.adapthorpe.com 1 |
2000 | btclickfam.excite.co.uk | conciergedesk.co.uk 1 |
2000 | formby.wiganmbc.gov.uk | www.charitynet.org 1 |
URL | postcode | year | host | domain |
---|---|---|---|---|
20000609075945/http://altberg.co.uk:80/military_boots.htm | DL10 4XB | 2000 | altberg.co.uk | altberg |
20001003045622/http://www.guest-house.demon.co.uk:80/ | CB2 1AA | 2000 | www.guest-house.demon.co.uk | demon |
20000917204128/http://www.millenniumit.co.uk:80/CV.htm | E3 5AN | 2000 | www.millenniumit.co.uk | millenniumit |
20000312143711/http://www.nova-tech.co.uk:80/page2.html | PR9 9DZ | 2000 | www.nova-tech.co.uk | nova-tech |
20000914061255/http://www.aleontap.co.uk:80/weblinks/ | WR6 6DH | 2000 | www.aleontap.co.uk | aleontap |
The data was combined by matching “domains”. If an origin or destination was found in in the postcode data, it was added to the file. Host-links without a postcode were dropped. This leaves us with an origin host, domain, postcode, and destination host, domain, postcode and the number of links between. This is shown below.
X1 | origin.host | orig.domain | origin.pc | dest.host | dest.domain | dest.pc | links |
---|---|---|---|---|---|---|---|
3 | 24carat.co.uk | 24carat | FY4 1RJ | 24carat.co.uk | 24carat | FY4 1RJ | 9201 |
8000 | www.lifestyle.co.uk | lifestyle | WC1A 2AE | www.lupine.demon.co.uk | demon | KT17 2HB | 3 |
1200 | www.barcodes-for-access.beechman-online.co.uk | beechman-online | BR1 1PD | www.ao-plotters.beechman-online.co.uk | beechman-online | BR1 1PD | 3 |
1500 | www.epos-software.beechman-online.co.uk | beechman-online | BR1 1PD | www.axiohm-cognitive-label-printers.beechman-online.co.uk | beechman-online | BR1 1PD | 4 |
2403 | www.bringfrd.demon.co.uk | demon | NN6 6HB | www.bringfrd.demon.co.uk | demon | NN6 6HB | 28 |
The next step was to remove website that linked to themselves (the first row above). This data does not interest us as we are looking for links between different websites. Therefore if origin host and destination host were the same, they were dropped. We now have host-host links with an associated unique postcode and the number of links.
The next step was to aggregate to the NUTS2 regions. This was done by using a postcode to NUTS2 (2010) lookup file combined with the above created data. Almost every postcode had an associated NUTS2 code, so this was added. The data was then aggregated summing all data with the same origin NUTS and destination NUTS codes. We are then left with our NUTS2 -> NUTS2 links. The same process was done for every year 2000-2010.
origin | destination | weight |
---|---|---|
UKC1 | UKC1 | 90 |
UKC1 | UKC2 | 1 |
UKC1 | UKE2 | 1 |
UKC1 | UKH3 | 1 |
UKC1 | UKI1 | 3 |