spaCy Named Entity Recognizer

Manivannan Murugavel
4 min readMar 29, 2019

What is spaCy(v2):

spaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. The library is published under the MIT license and currently offers statistical neural network models for English, German, Spanish, Portuguese, French, Italian, Dutch and multi-language NER, as well as tokenization for various other languages.
spaCy v2.0 features new neural models for tagging, parsing and entity recognition. The models have been designed and implemented from scratch specifically for spaCy, to give you an unmatched balance of speed, size and accuracy.
Convolutional layers with residual connections, layer normalization and maxout non-linearity are used, giving much better efficiency than the standard BiLSTM solution.
The spaCy v2.0’s models are 10× smaller, 20% more accurate, andeven cheaper to run than the previous generation.

In Deep Learning:
spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python’s awesome AI ecosystem. With spaCy, you can easily construct linguistically sophisticated statistical models for a variety of NLP problems.

Features

  • Non-destructive tokenization
  • Named entity recognition
  • Support for 49+ languages
  • 16 statistical models for 9 languages
  • Pre-trained word vectors
  • Easy deep learning integration
  • Part-of-speech tagging
  • Labelled dependency parsing
  • Syntax-driven sentence segmentation
  • Built in visualizers for syntax and NER
  • Convenient string-to-hash mapping
  • Export to numpy data arrays
  • Efficient binary serialization
  • Easy model packaging and deployment
  • State-of-the-art speed
  • Robust, rigorously evaluated accuracy

Installation Steps:

spaCy is compatible with 64-bit CPython 2.7 / 3.5+ and runs on Unix/Linux, macOS/OS X and Windows. The latest spaCy releases are available over pip and conda.

$ pip install -U spacy

Once your installation is completed, you can download the pre-trained model for spaCy. If you want to see the models, click the link.

Download the Model:

$ python -m spacy download en_core_web_sm
or
$ python -m spacy download en

How to Use:

>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
or
>>> nlp = spacy.load("en")

What is Named-entity recognition:

Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

The spaCy pretrained model has list of entity classes. I mentioned the classes and its descriptions below.

Example:

$ python
>>> import spacy
>>> nlp = spacy.load("en")
>>> text = "But Google is starting from behind. The company made a late push\ninto hardware, and Apple’s Siri, available on iPhones, and Amazon’s Alexa\nsoftware, which runs on its Echo and Dot devices, have clear leads in\nconsumer adoption."
>>> doc = nlp(text)
>>> for ent in doc.ents:
... print(ent.text, ent.start_char, ent.end_char, ent.label_)
...
Google 4 10 ORG
Apple’s Siri 84 96 ORG
iPhones 111 118 ORG
Amazon 124 130 ORG
Echo and Dot 167 179 ORG
>>>
output

Visualizing named entities:

If you want visualize the entities, you can run displacy.serve() function.

import spacy 
from spacy import displacy
text = """But Google is starting from behind. The company made a late push into hardware, and Apple’s Siri, available on iPhones, and Amazon’s Alexa software, which runs on its Echo and Dot devices, have clear leads in consumer adoption."""
nlp = spacy.load("en")
doc = nlp(text)
displacy.serve(doc, style="ent")

--

--