Supplying data

Approach to linking data

We use the student-level data submitted by providers to the Higher Education Statistics Agency (HESA) or the Education and Skills Funding Agency (ESFA) in conjunction with a range of linked, student-level data collected by other organisations.

By linking individualised student data with datasets from other stakeholders we are able to enhance the data we have available, and our understanding of it, without imposing an additional burden on providers.  

Datasets

The linked datasets that the OfS uses are:

Graduate Outcomes is a population survey of almost all graduates of higher education in the UK, in a given academic year. Information is collected by HESA about graduates’ outcomes and destinations 15 months after completing their higher education qualification including whether graduates are in: employment, study, or undertaking other activities, and to what extent their qualification played a part.

Our uses of this data include understanding and reporting the activities of graduates after they finish their course, and providing information for prospective students to make informed decisions. 

The National Student Survey (NSS) is a UK-wide survey that collects feedback from final year students about their higher education experience. Information is collected about a range of factors including the teaching on their course, assessment and feedback, academic support and how well courses were organised. 

Our uses of this data include providing information for prospective students to make informed decisions, and also for the purposes of regulating quality (including through the Teaching Excellence Framework).

The National Pupil Database (NPD) contains child-level and school-level data on all pupils in state schools in England. Information is held about pupils such as age, gender, ethnicity, attendance and exclusions, special educational needs, free school meal entitlement and educational attainment.

Our uses of this data include exploring and reporting on the relationship between different student characteristics, in particular prior educational attainment and free school meal entitlement, and a range of higher education outcomes.

The Longitudinal Education Outcomes (LEO) dataset contains employment, earnings, study and benefits data. It is based on administrative tax and benefits records for those working in the UK and we hold it for those who have recently studied higher education.

Our uses of this data include understanding and reporting economic outcomes for graduates in the years after they finish their course.

The Student Loans Company (SLC) holds data about students in receipt of student support, including tuition fee loans and disabled students’ allowance. 

Our uses of this data include creating a comparison tool to enable providers to compare their HESA data with data held by the Student Loans Company, to establish whether there are inconsistencies between the two, with regard to students receiving student support, and using it to explore and report on the relationship between different student characteristics and a range of higher education outcomes (in particular students’ household residual income and estrangement from their parents).

Pearson collects data about enrolments on its BTEC Higher National programmes.

Our uses of this data include checking whether students claiming student support from the SLC have registered with Pearson, as required, and to check whether students claiming support achieved a Pearson qualification when ending their course.

UCAS holds information about placed applicants and acceptances to higher education courses.

Our uses of this data include making comparisons with attendance in HESA and the ILR.

National directories and other classifications

We also link to national directories and other classifications, to derive more information about individual higher education students or the geographical areas they come from:

The ONS produces two key postcode products that link all current and terminated UK postcodes to a wide range of administrative, health and other geographic areas in which each postcode falls.

We use this data to gain an understanding about which local layer super output area (LSOA) or middle layer super output area (MSOA) students lived in before entering higher education. 

We use national classifications such as the index of multiple deprivation or standard occupational classification.

  • The index of multiple deprivation (IMD) is the official measure of relative deprivation in England for small areas (lower layer super output areas (LSOAs)) produced by the Government. We use this to provide information about the backgrounds of higher education students.
  • The standard occupational classification (SOC) produced by the Office for National Statistics (ONS) classifies occupations in terms of their skill level and skill content. We use this to provide information about the outcomes of higher education students.

We use OfS classifications such as tracking underrepresentation by area or association between characteristics of students.

  • Tracking underrepresentation by area (TUNDRA) is an area-based measure that uses tracking of state-funded mainstream school pupils in England to calculate young participation. It classifies local areas across England into five equal groups – or quintiles - based on the proportion of 16-year-old state-funded mainstream school pupils who participate in higher education aged 18 or 19 years. Its main objective is to help outreach programmes identify and target areas of low participation more effectively. We may also use this to provide information about the backgrounds of higher education students, including for the purposes of regulating quality and access and participation.
  • Association between characteristics of students (ABCS) is a set of analyses that seeks to better understand how outcomes vary for groups of students holding different sets of characteristics. We define groups of students by looking at a set of characteristics so that we can determine the effect of not just one characteristic on an outcome, but the effect of multiple characteristics. We may use this to provide information about the backgrounds of higher education students, including for the purposes of regulating quality and access and participation.

The links between some of these datasets and the HESA and ESFA student collections exist by design (for example, target lists for the Graduate Outcomes and NSS surveys establish unique identifiers between the student data and the survey response). Others rely on an analytical approach to identifying the same individual in different datasets.  

The OfS may change or extend the list of linked datasets in future, for example if additional data sources become available or existing datasets are replaced.

Published 18 January 2022
Last updated 01 February 2022
01 February 2022
Added document detailing the OfS's approach to data linkage.

Describe your experience of using this website

Improve experience feedback
* *

Thank you for your feedback