Extract entities from text in Excel
Perform Named Entity Recognition (NER) on text in individual cells using GPT for Excel. This guide walks you through extracting entities from text in Excel and optimizing results.
Step 1: Get started
Prerequisites
- You have installed GPT for Excel Word.
- You have opened a workbook containing data for extraction in a column.
Open the GPT for Excel Word add-in and run the Extract entities bulk tool with your text on a few rows to see how it works.
If you find the answers satisfactory, go ahead and launch your bulk entity extraction! Otherwise, check how to improve your results in Step 2.
Step 2: Improve your results (optional)
If the initial results are not satisfactory, you can improve them by refining your extraction method to address both missing extractions and incorrect extractions. All solutions use the Extract field and require gpt-4o, which follows instructions more accurately.
Issue | Possible cause | Solution |
---|---|---|
You don't know what to extract | Text contains entities you are not aware of | Use unsupervised extraction Instruct the model to extract all entities. This will give you an overview of the entities in the text.
|
Too many irrelevant results | Extraction instructions are too broad | Provide context Define in which context the entities should be extracted, for example extract Persons only if they are CEOs or CFOs.
|
Output is not normalized | Entities are extracted as they appear in the text | Normalize the extraction Provide a normalized form for the output entities, for example you may want to extract 'Advil' and 'Nurofen' as 'Ibuprofen', their USAN form.
|
Same entity extracted several times from one cell | Cell text contains multiple forms of the same entity | Define an output format Specify an output format to ensure each entity is extracted only once, under this format. For example 'John Doe' and 'Mr Doe' are extracted once, as 'John Doe', if they appear in the same text.
|
Generic terms extracted as entities | Ambiguous extraction instructions | Disambiguate the instructions Make the instructions more specific to prevent the extraction of generic terms. For example, Drugs extracts both drug names and synonyms of 'drug'.
|
Entities in specific format not extracted | Entity is unknown to the model | Define the entity form Specify the form of the entities to be extracted. For example, provide your Product ID format.
|
Once you have refined your extraction method and are satisfied with the results from the initial rows, you are ready to launch your entity extraction in bulk. Select more cells or even all cells, and click Run rows, and watch GPT for Excel handle the rest of the extractions.