|
|
Venues (Conferences, Journals, ...)
|
|
GrowBag graphs for keyword ? (Num. hits/coverage)
Group by:
The graphs summarize 4 occurrences of 4 keywords
|
|
|
Results
Found 75 publication records. Showing 75 according to the selection in the facets
Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
113 | Stefan Klatt, Bernd Bohnet |
You Don't Have to Think Twice if You Carefully Tokenize. |
IJCNLP |
2004 |
DBLP DOI BibTeX RDF |
|
93 | Robert Bernecky |
An SPMD/SIMD parallel tokenizer for APL. |
APL |
2003 |
DBLP DOI BibTeX RDF |
|
55 | Bin Ma 0001, Haizhou Li 0001 |
A phonotactic-semantic paradigm for automatic spoken document classification. |
SIGIR |
2005 |
DBLP DOI BibTeX RDF |
acoustic words, phonotactic-semantic, semantic domain, spoken document classification, voice tokenizer, n-gram |
50 | Run Shao, Zhaoyang Zhang, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li 0007 |
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
47 | Cody Boisclair |
Developing a tokenizer and morphological parser for English text in C#. |
ACM Southeast Regional Conference |
2008 |
DBLP DOI BibTeX RDF |
|
33 | Amir Shahab Shahabi, Mohammad Reza Kangavari |
A Fuzzy Approach for Persian Text Segmentation Based on Semantic Similarity of Sentences. |
Intelligent Information Processing |
2006 |
DBLP DOI BibTeX RDF |
Fuzzy Similarity Relation, Fuzzy Proximity Relation, Lemma, Fuzzy Relations Composition, Anti-Redundancy, Syntax Parser, Meta Variable, Meta Rule, Paradigmatic, Tokenizer, Multi-Document Summarizer, Lemmatizer |
25 | Nicolas Boizard, Kevin El Haddad, Céline Hudelot, Pierre Colombo |
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
25 | Omri Uzan, Craig W. Schmidt, Chris Tanner, Yuval Pinter |
Greed is All You Need: An Evaluation of Tokenizer Inference Methods. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
25 | Gautier Dagan, Gabriel Synnaeve, Baptiste Rozière |
Getting the most out of your tokenizer for pre-training and domain adaptation. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
25 | Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu |
ε-ViLM : Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer. |
WACV (Workshops) |
2024 |
DBLP DOI BibTeX RDF |
|
25 | Goodwill Erasmo Ndomba, Young-Seob Jeong |
Effects of Swahili Monolingual Tokenizer on Downstream Tasks. |
BigComp |
2024 |
DBLP DOI BibTeX RDF |
|
25 | Sanghyun Choo, Wonjoon Kim |
A study on the evaluation of tokenizer performance in natural language processing. |
Appl. Artif. Intell. |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Jungeun Kim, Ha Young Kim |
CSLT-AK: Convolutional-embedded transformer with an action tokenizer and keypoint emphasizer for sign language translation. |
Pattern Recognit. Lett. |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Zhiwei Deng, Ting Chen, Yang Li |
Perceptual Group Tokenizer: Building Perception with Iterative Grouping. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Sandeep Mehta, Darpan Shah, Ravindra Kulkarni, Cornelia Caragea |
Semantic Tokenizer for Enhanced Natural Language Processing. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Zipeng Xu, Enver Sangineto, Nicu Sebe |
StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Lijun Yu, José Lezama, Nitesh Bharadwaj Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang 0001, Irfan Essa, David A. Ross, Lu Jiang 0004 |
Language Model Beats Diffusion - Tokenizer is Key to Visual Generation. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu |
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying Shan |
Making LLaMA SEE and Draw with SEED Tokenizer. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Zhiyuan Liu, Yaorui Shi, An Zhang 0003, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang 0010, Tat-Seng Chua |
Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Felix Stollenwerk |
Training and Evaluation of a Multilingual Tokenizer for GPT-SW3. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu |
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Miao Fan, Chen Hu, Shuchang Zhou 0001 |
Proximal Policy Optimization Actual Combat: Manipulating Output Tokenizer Length. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, Sernam Lim |
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Christopher Meaney, Therese A. Stukel, Peter C. Austin, Michael D. Escobar |
Comparing Variation in Tokenizer Outputs Using a Series of Problematic and Challenging Biomedical Sentences. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Mehdi Ali, Michael Fromm 0001, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr |
Tokenizer Choice For LLM Training: Negligible or Crucial? |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Tatsuya Hiraoka, Tomoya Iwakura |
Downstream Task-Oriented Neural Tokenizer Optimization with Vocabulary Restriction as Post Processing. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Wenhao Li, Mengyuan Liu, Hong Liu 0009, Pichao Wang, Jialun Cai, Nicu Sebe |
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Zipeng Xu, Enver Sangineto, Nicu Sebe |
StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model. |
ICCV |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Jimin Sun, Patrick Fernandes, Xinyi Wang, Graham Neubig |
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models. |
EACL (Findings) |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, Sernam Lim |
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding. |
ICCV (Workshops) |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Tuan Aqeel Bohoran, Polydoros N. Kampaktsis, Laura McLaughlin, Jay Leb, Serafeim P. Moustakidis, Gerry P. McCann, Archontis Giannakidis |
Right Ventricular Volume Prediction by Feature Tokenizer Transformer-Based Regression of 2D Echocardiography Small-Scale Tabular Data. |
FIMH |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Zhiyuan Liu, Yaorui Shi, An Zhang 0003, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua |
Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules. |
NeurIPS |
2023 |
DBLP BibTeX RDF |
|
25 | Adhiraj Banerjee, Vipul Arora 0001 |
wav2tok: Deep Sequence Tokenizer for Audio Retrieval. |
ICLR |
2023 |
DBLP BibTeX RDF |
|
25 | Rinka Kiriyama, Akio Sashima, Ikuko Shimizu |
Robust Tokenizer for Vision Transformer. |
GCCE |
2023 |
DBLP DOI BibTeX RDF |
|
25 | Eugene Bagdasaryan, Congzheng Song, Rogier C. van Dalen, Matt Seigel, Áine Cahill |
Training a Tokenizer for Free with Private Federated Learning. |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
25 | Md Mofijul Islam, Gustavo Aguilar, Pragaash Ponnusamy, Clint Solomon Mathialagan, Chengyuan Ma, Chenlei Guo |
A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning. |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
25 | Jivnesh Sandhan, Rathin Singha, Narein Rao, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal 0002 |
TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer. |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
25 | Jimin Sun, Patrick Fernandes, Xinyi Wang, Graham Neubig |
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models. |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
25 | Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán |
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training? |
CoRR |
2022 |
DBLP DOI BibTeX RDF |
|
25 | Jivnesh Sandhan, Rathin Singha, Narein Rao, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal 0002 |
TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer. |
EMNLP (Findings) |
2022 |
DBLP DOI BibTeX RDF |
|
25 | Md Mofijul Islam, Gustavo Aguilar, Pragaash Ponnusamy, Clint Solomon Mathialagan, Chengyuan Ma, Chenlei Guo |
A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning. |
RepL4NLP@ACL |
2022 |
DBLP DOI BibTeX RDF |
|
25 | Jinghao Zhou, Chen Wei 0005, Huiyu Wang, Wei Shen 0002, Cihang Xie, Alan L. Yuille, Tao Kong |
Image BERT Pre-training with Online Tokenizer. |
ICLR |
2022 |
DBLP BibTeX RDF |
|
25 | Pavel Rychlý, Samuel Spalek |
Utok: The Fast Rule-based Tokenizer. |
RASLAN |
2022 |
DBLP BibTeX RDF |
|
25 | Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán |
How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training? |
AMTA |
2022 |
DBLP BibTeX RDF |
|
25 | Jinghao Zhou, Chen Wei 0005, Huiyu Wang, Wei Shen 0002, Cihang Xie, Alan L. Yuille, Tao Kong |
iBOT: Image BERT Pre-Training with Online Tokenizer. |
CoRR |
2021 |
DBLP BibTeX RDF |
|
25 | Sangah Lee, Hyopil Shin |
The Korean Morphologically Tight-Fitting Tokenizer for Noisy User-Generated Texts. |
W-NUT |
2021 |
DBLP DOI BibTeX RDF |
|
25 | Phillip Rust, Jonas Pfeiffer, Ivan Vulic, Sebastian Ruder, Iryna Gurevych |
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. |
ACL/IJCNLP (1) |
2021 |
DBLP DOI BibTeX RDF |
|
25 | Phillip Rust, Jonas Pfeiffer, Ivan Vulic, Sebastian Ruder, Iryna Gurevych |
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. |
CoRR |
2020 |
DBLP BibTeX RDF |
|
25 | Daniele Mazzei, Giacomo Baldi, Gualtiero Fantoni, Gabriele Montelisciani, Antonio Pitasi, Laura Ricci, Lorenzo Rizzello |
A Blockchain Tokenizer for Industrial IOT trustless applications. |
Future Gener. Comput. Syst. |
2020 |
DBLP DOI BibTeX RDF |
|
25 | Dokook Choe, Rami Al-Rfou, Mandy Guo, Heeyoung Lee, Noah Constant |
Bridging the Gap for Tokenizer-Free Language Models. |
CoRR |
2019 |
DBLP BibTeX RDF |
|
25 | Kazuhisa Nakasho |
Development of a Flexible Mizar Tokenizer and Parser for Information Retrieval System. |
FedCSIS |
2019 |
DBLP DOI BibTeX RDF |
|
25 | Taku Kudo, John Richardson |
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. |
CoRR |
2018 |
DBLP BibTeX RDF |
|
25 | Taku Kudo, John Richardson |
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. |
EMNLP (Demonstration) |
2018 |
DBLP DOI BibTeX RDF |
|
25 | Johannes Graën, Mara Bertamini, Martin Volk 0001 |
Cutter - a Universal Multilingual Tokenizer. |
SwissText |
2018 |
DBLP BibTeX RDF |
|
25 | Matthieu Jimenez, Maxime Cordy, Yves Le Traon, Mike Papadakis |
On the Impact of Tokenizer and Parameters on N-Gram Based Code Analysis. |
ICSME |
2018 |
DBLP DOI BibTeX RDF |
|
25 | Kazuma Takaoka, Sorami Hisamoto, Noriko Kawahara, Miho Sakamoto, Yoshitaka Uchida, Yuji Matsumoto 0001 |
Sudachi: a Japanese Tokenizer for Business. |
LREC |
2018 |
DBLP BibTeX RDF |
|
25 | Luz Marina Sierra, Carlos Alberto Cobos Lozada, Juan Carlos Corrales |
Tokenizer Adapted for Nasa Yuwe Language. |
Computación y Sistemas |
2016 |
DBLP DOI BibTeX RDF |
|
25 | K. Divyavarma, M. Remya, G. Deepa |
An Enhanced Bug Mining for Identifying Frequent Bug Pattern Using Word Tokenizer and FP-Growth. |
FICTA (1) |
2016 |
DBLP DOI BibTeX RDF |
|
25 | György Szaszák, Máté Ákos Tündik, András Beke |
Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer. |
KDIR |
2016 |
DBLP DOI BibTeX RDF |
|
25 | Juhaida Abu Bakar, Khairuddin Omar, Mohammad Faidzul Nasrudin, Mohd Zamri Murah |
Tokenizer for the Malay language using pattern matching. |
ISDA |
2014 |
DBLP DOI BibTeX RDF |
|
25 | Arianna Pipitone, Maria Carmela Campisi, Roberto Pirrone |
An A* Based Semantic Tokenizer for Increasing the Performance of Semantic Applications. |
ICSC |
2013 |
DBLP DOI BibTeX RDF |
|
25 | Jirí Marsík, Ondrej Bojar |
TrTok: A Fast and Trainable Tokenizer for Natural Languages. |
Prague Bull. Math. Linguistics |
2012 |
DBLP BibTeX RDF |
|
25 | Neil Barrett, Jens H. Weber-Jahnke |
Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm. |
BMC Bioinform. |
2011 |
DBLP DOI BibTeX RDF |
|
25 | Neil Barrett, Jens H. Weber-Jahnke |
Building a Biomedical Tokenizer Using the Token Lattice Design Pattern and the Adapted Viterbi Algorithm. |
ICMLA |
2010 |
DBLP DOI BibTeX RDF |
|
25 | Aasish Pappu, Ratna Sanyal |
Vaakkriti: Sanskrit Tokenizer. |
IJCNLP |
2008 |
DBLP BibTeX RDF |
|
25 | Chengguo Jin, Seung-Hoon Na, Dong-Il Kim, Jong-Hyeok Lee |
Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer. |
IJCNLP |
2008 |
DBLP BibTeX RDF |
|
25 | Oana Frunza |
A Trainable Tokenizer, solution for multilingual texts and compound expression tokenization. |
LREC |
2008 |
DBLP BibTeX RDF |
|
25 | Zhi-Jie Chang, Hsiao-Chuan Wang |
以高斯混合模型表徵器與語言模型為基礎之語言辨認 (Language Identification based on Gaussian Mixture Model Tokenizer and Language Model) [In Chinese]. |
ROCLING |
2005 |
DBLP BibTeX RDF |
|
23 | Rong Tong, Bin Ma 0001, Haizhou Li 0001, Chng Eng Siong |
A Target-Oriented Phonotactic Front-End for Spoken Language Recognition. |
IEEE Trans. Speech Audio Process. |
2009 |
DBLP DOI BibTeX RDF |
|
23 | Yu-Chieh Wu, Jie-Chi Yang |
A Robust Passage Retrieval Algorithm for Video Question Answering. |
IEEE Trans. Circuits Syst. Video Technol. |
2008 |
DBLP DOI BibTeX RDF |
|
23 | Rong Tong, Bin Ma 0001, Haizhou Li 0001, Engsiong Chng |
Target-oriented phone tokenizers for spoken language recognition. |
ICASSP |
2008 |
DBLP DOI BibTeX RDF |
|
23 | Hong Phuong Le, Nguyên Thi Minh Huyên, Azim Roussanaly, Hô Tuòng Vinh |
A Hybrid Approach to Word Segmentation of Vietnamese Texts. |
LATA |
2008 |
DBLP DOI BibTeX RDF |
|
23 | Francisco-Mario Barcala, Jesús Vilares Ferro, Miguel A. Alonso 0001, Jorge Graña Gil, Manuel Vilares Ferro |
Tokenization and Proper Noun Recognition for Information Retrieval. |
DEXA Workshops |
2002 |
DBLP DOI BibTeX RDF |
|
23 | Jesús Vilares Ferro, Francisco-Mario Barcala, Miguel A. Alonso 0001, Jorge Graña Gil, Manuel Vilares Ferro |
Practical NLP-Based Text Indexing. |
IBERAMIA |
2002 |
DBLP DOI BibTeX RDF |
|
Displaying result #1 - #75 of 75 (100 per page; Change: )
|
|