Huggingface - NER, Summarize, and more

We provide out of the box integration with HuggingFace.

Named Entity Recognition

Extract named entities from text.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

ds.apply_transformers_pipeline(
    text_fields=["review"], pipeline=nlp, output_field="ner"
)

Summarize

Summarize long text.

nlp = pipeline("summarization", model="google/pegasus-xsum", tokenizer="google/pegasus-xsum")

ds.apply_transformers_pipeline(
    text_fields=["review"], pipeline=nlp, output_field="summary"
)

Translation

Translate from english to many different languages.

nlp = pipeline("translation_en_to_fr", model="t5-small", tokenizer="t5-small")

ds.apply_transformers_pipeline(
    text_fields=["review"], pipeline=nlp, output_field="translation"
)

HuggingFace provides many more other NLP methods to help you enhance your unstructured data: https://huggingface.co.