文書分類からはじめる自然言語処理入門－基本からBERTまで－

著：	新納浩幸氏（茨城大学）古宮嘉那子氏（東京農工大学）
定価：	2,970円（本体2,700円＋税）
判型：	B5変型
ページ数：	206 ページ
ISBN：	978-4-910558-14-1
発売日：	2022/7/20
管理No：	109

書籍内で使用されているプログラムはこちらにございます。

第1章
Chapter1.ipynb
第2章
Chapter2.ipynb
第3章
Chapter3.ipynb

Chapter3-2 多クラス分類編.ipynb
第4章
Chapter4-3-HMM.ipynb

Chapter4-3-CRF.ipynb

Chapter4-3-LSTM-1.ipynb

Chapter4-3-LSTM-2.ipynb
第5章
Chapter5-2-BERT.ipynb

Chapter5-4-BertForSequenceClassification.ipynb

Chapter5-5-BertForTokenClassification.ipynb

Chapter5-6-pipeline.ipynb

2024/7/29更新

目次
参考文献
口コミ

【目次】

第１章　文書のベクトル化

１.１　文書分類とその入力
１.２　単語分割
１.３　N-gram
１.４　Bag-of-words
１.５　TF-IDF
１.６　Latent Semantic Analysis

第２章　分散表現

２.１　分散表現とは
２.２　cos 類似度
２.３　word2vec
２.４　doc2vec

第３章　分類問題

３.１　分類問題とは
３.２　分類問題と教師あり学習
３.３　Naive Bayes
３.４　文書分類の評価
３.５　ロジスティック回帰
３.６　Support Vector Machine
３.７　ニューラルネットワークとディープラーニング
３.８　半教師あり学習

第４章　系列ラベリング問題

４.１　系列ラベリング問題とは
４.２　系列ラベリング問題のタスク
1. ４.２.１　単語分割
2. ４.２.２　固有表現抽出
４.３　系列ラベリング問題の解法
1. ４.３.１　HMM
2. ４.３.２　CRF
3. ４.３.３　LSTM

第５章　BERT

５.１　事前学習済みモデルとは
５.２　BERT の入出力
５.３　BERT 内部の処理
1. ５.３.１　Transformer
2. ５.３.２　Position Embeddings
3. ５.３.３　BertLayer
4. ５.３.４　Multi-Head Attention
５.４　BERT による文書分類
５.５　BERT による系列ラベリング
５.６　Pipeline によるタスクの推論
1. ５.６.１　評判分析
2. ５.６.２　固有表現抽出
3. ５.６.３　要約
4. ５.６.４　質問応答
5. ５.６.５　テキスト生成
6. ５.６.６　Zero-shot 文書分類

【参考文献】

https://chokkan.github.io/python/?s=09
https://nlp100.github.io/ja/
https://taku910.github.io/mecab/
https://taku910.github.io/mecab/工藤拓（著） , 言語処理学会（編集）形態素解析の理論と実装（実践・自然言語処理シリーズ） 2018 年
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
https://scikit-learn.org/stable/modules/decomposition.html#lsa
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR Workshop 2013. pp. 1–12 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013. pp. 1–9 (2013)
Mikolov, T., tau Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL 2013. pp. 746–751 (2013)
https://radimrehurek.com/gensim/models/word2vec.html
https://github.com/WorksApplications/Sudachi
https://www.gsk.or.jp/catalog/gsk2020-d/
https://cl.asahi.com/api\_data/wordembedding.html
http://www.cl.ecei.tohoku.ac.jp/ m-suzuki/jawiki vector/
Quoc V. Le, Tomas Mikolov, Distributed Representations of Sentences and Documents, Proceedings of the 31st International Conference on Machine Learning, pp.1188–1196, (2014).
https://radimrehurek.com/gensim/models/doc2vec.html
https://scikit-learn.org/stable/modules/generated/sklearn.naive\_bayes.BernoulliNB.html
https://scikit-learn.org/stable/modules/generated/sklearn.naive\_bayes.MultinomialNB.html
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, (2002)
Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Schoelkopf. Learning with local and global consistency (2004)

【口コミ】

※口コミはありません。

文書分類からはじめる自然言語処理入門－基本からBERTまで－

【目次】

第１章 文書のベクトル化

第２章 分散表現

第３章 分類問題

第４章 系列ラベリング問題

第５章 BERT

【参考文献】

【口コミ】

第１章　文書のベクトル化

第２章　分散表現

第３章　分類問題

第４章　系列ラベリング問題

第５章　BERT