Applying NLP to sustainable investing

Platforms like Hugging Face and pre-trained models make it easy to experiment with NLP

Jun 05, 2023

Natural Language Processing (NLP) can be deployed at scale to analyze company’s sustainability reports for a variety of purposes, such as text classification and sentiment analysis, which can help determine the company’s commitment to sustainability goals.

With platforms like Hugging Face, it is now easy to deploy a model that has been pre-trained on sustainability words.

For this article, we will use a pre-trained model called ESG-BERT. For more details, you can read about the training in this Medium article by the model’s creator Mukut Mukherjee. Instead of training a model ourself, we will focus more on the application instead.

I will also refer to this Jupyter notebook by Hannah Morgan on the steps to parse PDFs from companies’ sustainability reports and applying a classifier on them.

Let’s dive in!

The ESG-BERT model will help to classify a sentence by giving it a score to the 25 labels below:

If a sentence is deemed to have a high score on one of the labels, you can think of it as having a high correlation with the particular label. You can head over to the Hosted inference API on Hugging Face to play with it:

You can type in anything in the textbox and see how the scores change from this:

To this:

So yes, we should probably also include sentiment analysis to differentiate between the positive and negatives but for the purpose of this exercise, let us just focus on text classification for now.

With that in mind, we can now move on to apply this model on companies’ responsibility reports hosted on responsibilityreports.com.

We will use a Python library called Tika to help us with parsing the documents into sentences, and then apply the ESG-BERT model via the Hugging Face transformers library. I then group the label and calculate the mean score.

I used the ESG-BERT model on three companies: McDonald’s, Amazon and Newmont Mining. For McDonald’s, the top issues highlighted in its sustainability report are Waste And Hazardous Materials Management, Critical Incident Risk Management and Customer Privacy. Data Security, on the other hand, seems to have a low score and may suggest that it is not an issue that the company has written a lot about in the report.

Scores for McDonald’s:

Amazon, on the other hand, has a higher score on Data Security (which is probably expected given they handle a lot of online transactions).

Scores for Amazon:

Interestingly though, Water and Wastewater Management seems to be a top issue for both these companies. And alarmingly, Selling Practices And Product Labeling has a low score for an e-commerce company like Amazon.

Moving on to Newmont Mining, other than the familiar label Water and Wastewater Management, other prominent issues include Employee Health and Safety, Physical Impacts Of Climate Change and Human Rights and Community Relations. These are probably issues that are more relevant for a mining company and it is therefore not surprising to see them being covered more frequently in its sustainability report.

Scores for Newmont Mining:

So what did I learn from this experiment? A few things:

[1] We are indeed in the golden age of large language models and it is not very difficult to get started on NLP these days (compared to a few years ago). If you just want to dip your toes in this field, there are plenty of pre-trained models available, and with a domain specific one like ESG-BERT, you can get started on trying out NLP models quickly and then fine tune the model later.

[2] From point 1 above, I can imagine this being deployed at a large scale to companies’ sustainability reports to help with determining companies’ ESG ratings.

[3] Text classification with NLP is a powerful tool — you can get summarized scores like the ones above in minutes, derived from hundreds or thousands of pages of reports. These reports probably took the authors months to prepare, and in the olden days would also require a team of analysts a few days to fully digest them.

[4] These techniques are not fool proof, but they are a good starting point to understand a company’s sustainability policy.

You can check out the link to the Jupyter Notebook here if you are interested in applying these yourself.

Quantifying ESG

Discussion about this post