tool.page.titleprefix Tokenizer
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Boğaziçi University
Abstract
Description
Tokenization is the process of segmenting a text into tokens. Given a text, the tokenizer identifies the tokens (words, punctuation marks, etc.) within the text and outputs the tokens separately. This process is necessary for applications that work on a per token basis.