Job Quality Extractor

Overview

This package is designed to extract and analyze job quality measures from job adverts using natural language processing techniques. The package identifies job quality aspects based on sentence similarity and pre-defined target phrases, helping to classify and quantify job quality indicators from large datasets of job descriptions. This work was funded by the Economic Statistics Centre of Excellence.

What dimensions of job quality do you extract?

The term "job quality" refers to aspects of a job that affect worker wellbeing - for example how much the job is paid, and whether the contract is permanent. Most research on job quality rightly focuses on data from the employee's point of view, using surveys or interviews or, recently, online reviews.

We took as our starting point CIPD's seven dimensions of job quality:

pay and benefits
contract (elsewhere called terms of employment)
work-life balance
job design and the nature of work
relationships at work
employee voice
health and wellbeing

We also added an additional category, ‘barriers to access’, to our taxonomy, so that dimensions of job quality that directly impact marginalised groups might be gathered together. We made one further addition, “atmosphere, culture and environment”, which fits under “Social support and cohesion” and which we took from Sleeman 2024. Our taxonomy of job quality can be seen here.

Installation

To install the package, run

pip install git+https://github.com/nestauk/dap_job_quality.git

Quickstart

To extract dimensions of job quality from a single job advert or from a list of job adverts, you can use the extract_job_quality() function. This function takes a dataframe of job adverts as input, and returns

A dataframe with the job adverts split into sentences; each sentence is labelled 0 or 1 according to whether it is related to job quality, and sentences labelled 1 are also matched to the taxonomy.
A concise dict which just contains the ID of each advert, and the target phrases that it was matched to.

Example usage:

from dap_job_quality.pipeline.find_job_quality import JobQuality
import pandas as pd

# Initialize JobQuality class
job_quality = JobQuality()
job_quality.load()

# Example job adverts dataframe
job_adverts = pd.DataFrame(
    [
        {'id': 123, 'description': '[This is a job advert. It has many benefits such as a pension scheme and a cycle to work scheme.]'},
        {'id': 234, 'description': '[This is a job advert for a bank job. There are free childcare vouchers. We also offer a yearly bonus and generous salary.]'}
    ]
)

# Extract job quality
jq_df_filtered, job_id_to_target_phrase = job_quality.extract_job_quality(
    job_adverts, id_col="id", text_col="description"
)

The output dataframe jq_df_filtered should look like this:

id	description	clean_description	job_quality_label	sentences_split	ngrams	target_phrase	cosine_similarity	subcategory
123	This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	a cycle to work	Cycle to work	0.965111	PERKS
123	This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	many benefits such as	benefits	0.874949	PERKS
123	This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	such as a pension	pension	0.821573	COMP
123	This is a job advert. It has many benefits su...	This is a job advert. It has many benefits suc...	LABEL_1	It has many benefits such as a pension scheme ...	a pension scheme and	pension scheme	0.964935	COMP
234	This is a job advert for a bank job. There ar...	This is a job advert for a bank job. There are...	LABEL_1	There are free childcare vouchers.	There are free childcare vouchers.	childcare vouchers	0.838904	CARING
234	This is a job advert for a bank job. There ar...	This is a job advert for a bank job. There are...	LABEL_1	We also offer a yearly bonus and generous salary.	bonus and generous salary.	compensation	0.576268	COMP
234	This is a job advert for a bank job. There ar...	This is a job advert for a bank job. There are...	LABEL_1	We also offer a yearly bonus and generous salary.	a yearly bonus and	performance bonus	0.618560	COMP

Meanwhile, the more concise output, job_id_to_target_phrase, should look like this:

{
    123: ['Cycle to work', 'benefits', 'pension', 'pension scheme'],
    234: ['childcare vouchers', 'compensation', 'performance bonus']
 }

How does it work?

The pipeline comprises 4 basic steps:

Clean the text minimally, then separate the advert into sentences
Classify the sentences as either relating to job quality (eg "We are a friendly supportive team") or not relating to job quality (eg "You must have a friendly supportive demeanour"). More detail on the classifier here and in the README.
Chunk up the sentences
Match the sentence chunks to the taxonomy (more detail on steps 3 and 4 here)