OJD DAPS Company Description Classifier
OJD DAPS Company Description Classifier is a Python library powered by a fine-tuned BERT model, designed to determine whether a given sentence describes what a company does. It's particularly useful in parsing job advertisements to extract sentences that outline company descriptions, providing a score from 0 to 1 that reflects the model's confidence in the identification.
Installation
This library can be installed using pip. Ensure you have Python 3.10 or newer installed on your system before proceeding.
pip install git+https://github.com/nestauk/ojd_daps_company_descriptions.git
Usage
Basic Usage
To use the extract_company_description function, you simply need to pass a sentence to it. The function returns a dictionary with a score indicating how likely the sentence is to be describing a company.
from ojd_daps_company_descriptions import extract_company_description
sentence = "We are a manufacturing company specializing in innovative solutions."
result = extract_company_description(sentence)
print(result)
[{'label': 'LABEL_1', 'score': 0.9953641891479492}]
The output will be a dictionary, where the score ranges from 0 to 1. A higher score suggests a higher likelihood that the sentence is a company description.
Working with Job Adverts
When dealing with job adverts, which typically consist of multiple sentences with only some referring to the company description, you can utilize the library in conjunction with pandas to efficiently process and extract relevant descriptions.
Assuming you have a pandas DataFrame job_ads
with a column description
that contains the text of the job adverts, you can apply the extract_company_description function to each row to identify and score sentences related to company descriptions.
import pandas as pd
from company_description_extractor import extract_company_description
# Assuming `job_ads` is your DataFrame and `description` is the column with job descriptions
def extract_description_scores(description_text):
sentences = description_text.split('.')
scores = []
for sentence in sentences:
score = extract_company_description(sentence)[0]['score']
scores.append((sentence, score))
return scores
job_ads['description_scores'] = job_ads['description'].apply(extract_description_scores)
print(job_ads['description_scores'])
This approach splits the job description into sentences and applies the extract_company_description function to each, collecting the scores. You can then use these scores to filter or highlight descriptions that are likely to be about the company.
Methodology
To see how we trained our model, details on its performance, and code relating to producing our training set, please refer to our doumentation here.
License
This project is licensed under the MIT License - see the LICENSE file for details on this.