This process describes how geographic regions were added to the host-linkage dataset provided by JISC UK Web Domain Dataset1. The data is from 2000-2010. The process begins combining the host-host links to a file containing unique postcodes for each host ending in co.uk. An example is below.

Host-linkage file
year origin destination/links
2000 btclickbus.excite.co.uk greenwich2000.co.uk 1
2000 btclickfree.excite.co.uk www.rockvillecenter.com 2
2000 adapthorpe.com www.adapthorpe.com 1
2000 btclickfam.excite.co.uk conciergedesk.co.uk 1
2000 formby.wiganmbc.gov.uk www.charitynet.org 1
Unique postcodes and hosts
URL postcode year host domain
20000609075945/http://altberg.co.uk:80/military_boots.htm DL10 4XB 2000 altberg.co.uk altberg
20001003045622/http://www.guest-house.demon.co.uk:80/ CB2 1AA 2000 www.guest-house.demon.co.uk demon
20000917204128/http://www.millenniumit.co.uk:80/CV.htm E3 5AN 2000 www.millenniumit.co.uk millenniumit
20000312143711/http://www.nova-tech.co.uk:80/page2.html PR9 9DZ 2000 www.nova-tech.co.uk nova-tech
20000914061255/http://www.aleontap.co.uk:80/weblinks/ WR6 6DH 2000 www.aleontap.co.uk aleontap

The data was combined by matching “domains”. If an origin or destination was found in in the postcode data, it was added to the file. Host-links without a postcode were dropped. This leaves us with an origin host, domain, postcode, and destination host, domain, postcode and the number of links between. This is shown below.

Combined host and postcode data
X1 origin.host orig.domain origin.pc dest.host dest.domain dest.pc links
3 24carat.co.uk 24carat FY4 1RJ 24carat.co.uk 24carat FY4 1RJ 9201
8000 www.lifestyle.co.uk lifestyle WC1A 2AE www.lupine.demon.co.uk demon KT17 2HB 3
1200 www.barcodes-for-access.beechman-online.co.uk beechman-online BR1 1PD www.ao-plotters.beechman-online.co.uk beechman-online BR1 1PD 3
1500 www.epos-software.beechman-online.co.uk beechman-online BR1 1PD www.axiohm-cognitive-label-printers.beechman-online.co.uk beechman-online BR1 1PD 4
2403 www.bringfrd.demon.co.uk demon NN6 6HB www.bringfrd.demon.co.uk demon NN6 6HB 28

The next step was to remove website that linked to themselves (the first row above). This data does not interest us as we are looking for links between different websites. Therefore if origin host and destination host were the same, they were dropped. We now have host-host links with an associated unique postcode and the number of links.

The next step was to aggregate to the NUTS2 regions. This was done by using a postcode to NUTS2 (2010) lookup file combined with the above created data. Almost every postcode had an associated NUTS2 code, so this was added. The data was then aggregated summing all data with the same origin NUTS and destination NUTS codes. We are then left with our NUTS2 -> NUTS2 links. The same process was done for every year 2000-2010.

Final NUTS2 data
origin destination weight
UKC1 UKC1 90
UKC1 UKC2 1
UKC1 UKE2 1
UKC1 UKH3 1
UKC1 UKI1 3

  1. https://data.webarchive.org.uk/opendata/ukwa.ds.2/geo/