HOME
書籍
設計技術シリーズ
まるっと解説 Python×ケモインフォマティクスデータ収集から予測・生成まで

まるっと解説　Python×ケモインフォマティクス　データ収集から予測・生成まで

会員登録するとページ閲覧だけでポイント貯蓄！
最大500ポイント利用可能！

設計技術シリーズ

まるっと解説
Python×ケモインフォマティクス
データ収集から予測・生成まで

著：	江崎剛史氏（滋賀大学）池田和由氏・清水祐吾氏（理化学研究所）
定価：	4,950円（本体4,500円＋税）
判型：	A5
ページ数：	328 ページ
ISBN：	978-4-910558-49-3
発売日：	2025/12/12
管理No：	147

発売前の予約注文を承っております

目次
参考文献
口コミ

【目次】

第１章　ケモインフォマティクスを始めるために

１.１　化学分野におけるインフォマティクス
1. １.１.１　従来の化学分野における研究と課題
2. １.１.２　ケモインフォマティクスの実際
3. １.１.３　ケモインフォマティクスを活用した物質開発の支援に向けて
１.２　Google Colabの使い方
1. １.２.１　環境構築
2. １.２.２　簡単な計算
3. １.２.３　変数の型
4. １.２.４　データ構造
5. １.２.５　プログラムの基本
6. １.２.６　データの可視化
7. １.２.７　RDKit
１.３　この章で使用したPythonコード

第２章　化合物の表記方法

２.１　化合物構造の表現方法
1. ２.１.１　SMILES表記
2. ２.１.２　InChI表記
3. ２.１.３　MOL表記（SDF 表記）
4. ２.１.４　複数化合物の表記方法
5. ２.１.５　構造データの保存と読み込み
6. ２.１.６　化合物の標記に関した応用研究
２.２　化合物の記述子情報
1. ２.２.１　フィンガープリント
２.３　物理化学的な特性
２.４　この章で使用したPythonコード

第３章　化合物データベースを使う

３.１　代表的な化合物データベース
３.２　データベースへのアクセス
1. ３.２.１　ウェブサイト経由でアクセスする方法
2. ３.２.２　API を利用する方法
3. ３.２.３　ローカルPC で利用する方法
３.３　応用編
1. ３.３.１　PubChem APIを用いた化合物の類似性検索
2. ３. ３. ２　化合物データベースを使う上で注意する点
３.４　この章で使用したPython コード

第４章　化合物の類似性探索

４.１　化合物の類似性の計算
1. ４.１.１　類似度の計算
2. ４.１.２　距離の計算
3. ４.１.３　類似度と距離の比較
4. ４.１.４　フィンガープリントの違いによる類似度の違い
４.２　ケミカルスペース
1. ４.２.１　次元圧縮法
2. ４.２.２　主成分分析（PCA）
3. ４.２.３　t-SNE
4. ４.２.４　UMAP
４.３　クラスタリング
1. ４.３.１　階層的クラスタリング
2. ４.３.２　非階層的クラスタリング
４.４　この章で使用したPythonコード

第５章　記述子を使った特性予測

５.１　特性を予測すること
５.２　データの前処理
５.３　回帰モデル
1. ５.３.１　回帰モデルの評価方法
2. ５.３.２　重回帰モデル
3. ５.３.３　正則化回帰モデル
4. ５.３.４　ランダムフォレスト回帰モデル
５.４　判別モデル
1. ５.４.１　判別モデルの評価方法
2. ５.４.２　ロジスティック回帰モデル
3. ５.４.３　ニューラルネットワークモデル
５.５　結果の解釈
1. ５.５.１　線形モデルの係数
2. ５.５.２　決定木の重要度
3. ５.５.３　SHAP値
５.６　この章で使用したPythonコード

第６章　化合物の構造生成

６.１　逆解析
６.２　SMILESの生成
1. ６.２.１　LSTMを使った構造生成
2. ６.２.２　オートエンコーダーを使った構造生成
3. ６.２.３　その他の構造生成
4. ６.２.４　SELFIES：化合物の柔軟な表記方法
６.３　この章で使用したPythonコード

第７章　最適な実験条件の探索

７.１　逆解析による条件探索
７.２　応答曲面法
７.３　ガウス過程回帰を用いたベイズ最適化
1. ７.３.１　ガウス過程回帰
2. ７.３.２　ベイズ最適化
3. ７.３.３　ハイパーパラメータの最適化
７.４　この章で使用したPythonコード

第８章　構造（グラフ）を使った特性予測

８.１　グラフ畳み込みネットワーク（Graph convolutional network）
８.２　特性を予測する
８.３　ハイパーパラメータの調整
８.４　化合物の可視化、XAI
８.５　この章で使用したコマンド、設定ファイル
1. ８.５.１　使用したkMoL のインストールコマンド
2. ８.５.２　使用した主なkMoL コマンドと設定ファイル

第９章　今後の学習に向けて

９.１　データサイエンス一般
1. ９.１.１　データ活用
９.２　統計学関連
1. ９.２.１　統計一般
2. ９.２.２　多変量解析
3. ９.２.３　統計的モデル
９.３　プログラミング関連
1. ９.３.１　Python
2. ９.３.２　PyTorch
3. ９.３.３　R を使ったケモインフォマティクス
９.４　機械学習・深層学習
1. ９.４.１　機械学習
2. ９.４.２　ベイズ最適化
3. ９.４.３　深層学習
９.５　ケモインフォマティクス
1. ９.５.１　ケモインフォマティクス一般
2. ９.５.２　特性の予測
3. ９.５.３　構造生成
4. ９.５.４　実験計画法
5. ９.５.５　学会関連

【参考文献】

厚生労働省、医薬品産業ビジョン2021 資料編：https://www.mhlw.go.jp/content/10800000/000831974.pdf
Brown F. K., (1998) Chapter35 -Chemoinformatics: What is it and how does it impact drug discovery. Annual Reports in Medical Chemistry, 33, 375-384
Palmer D. S., O'Boyle N. M., Glen R. C., et al. (2006) Random forest models to predict aqueous solubility. J. Chem. Inform. Model. 47,150-158(2006)
PyTorch公式サイト：http://pytorch.org/
PyTorchチュートリアル：http://pytorch.org/tutorials/
PyTorchドキュメント：http://pytorch.org/docs/
杜世橋、現場で使えるPyTorch 開発入門、翔泳社
RDKit: Open-Source Cheminformatics Software：https://www.rdkit.org/
RDKitドキュメンテーション日本語版（非公式）：https://rdkit.org/docs_jp/index.html
Python でRDKit を始めよう：https://www.rdkit.org/docs_jp/Getting_Started_with_RDKit_in_Python_jp.html
Bongini P., Bianchini M., Scarselli F. (2021) Molecular graph generation with graph neural networks, Neurocomputing, 450, 242-252
Gómez-Bombar R., Wei J. N., Duvenaud D. et al (2018) Automatic chemical design using a data-driven continuous representation of molecules, ACS Cent. Sci. 4, 2, 268-276
Clevert D. A., Le T., Winter R. et al (2021) Img2Mol – accurate SMILES recognition from molecular graphical depictions, Chem. Sci. 12, 14174-14181
ChemIntelligence：https://chemintelligence.com/blog/machine-learning-descriptors-molecules
Huang G., Li J., Zhao C. (2018) Computational prediction and analysis of associations between small molecules and binding-associated S-nitrosylation sites, Molecules, 23, 954
RDKit：https://www.rdkit.org/docs/GettingStartedInPython.html#list-ofavailable-fingerprints
Muegge I., Mukherjee P. (2016) An overview of molecular fingerprint similarity search in virtual screening, Expert Opin. Drug Discov. 11, 137-148
Carhart R. E., Smith D. H., Venkataraghavan R. (1985) Atom pairs as molecular features in structure activity studies: definition and applications, J. Chem. Inf. Comput. Sci., 25, 64–73
Nilakantan R., Bauman N., Dixon J. S. et al (1987) Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors, J. Chem. Inf. Comput. Sci., 27, 82–85
MACCSkeysの構造情報一覧：https://github.com/rdkit/rdkit-orig/blob/master/rdkit/Chem/MACCSkeys.py
Asad's Blog：https://chembioinfo.wordpress.com/2011/10/30/revisiting-molecular-hashed-fingerprints/
Capecchi A., Probst D., Reymond J.-L. (2020) One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome, J. Cheminform., 12, 43
Mendez D., Gaulton A., Bento A. P. et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Research, 47(D1), D930-D940
Kim S., Chen J., Cheng T., et al (2021) PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Research, 49(D1), D1388-D1395
Wishart D. S., Feunang Y. D., Guo A. C., et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Research, 46(D1), D1074-D1082
Kanehisa M., Furumichi M., Tanabe M., et al (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Research, 45(D1), D353-D361
Papadatos G., Brown N., Patel V., et al (2016). SureChEMBL: a large- scale, chemically annotated patent document database. Nucleic Acids Research, 44(D1), D1220-D1228
Irwin J. J., Tang K. G., Young J., et al (2020) ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model., 60, 6065-6073
ChEMBL API：https://www.ebi.ac.uk/chembl/api/data/docs.
PubChem API：https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest#section=URL-based-API.
Bento A. P., Hersey A., Félix E., et al (2020). An open source chemical structure curation pipeline using RDKit. J. Cheminform. 12, 51
RDKit：https://rdkit.org/
LibreTexts：https://chem.libretexts.org/Courses/Intercollegiate_Courses/Cheminformatics
Martin Y. C., Kofron J. L., Traphagen L. M. (2020) Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 19, 4350–4358
Aggarwal C. C., Hinneburg A., Kein D. (2001) On the surprising behavior of distance metrics in high dimensional space, International Conference on Database Theory – ICDT 2001, 420-434
永田靖、棟近雅彦、多変量解析法入門（ライブラリ新数学大系 E20）、サイエンス社
小西貞則、多変量解析入門――線形から非線形へ、岩波書店
松井秀俊、多変量解析（データサイエンス大系）、学術図書出版社
Towards data science, Understanding t-SNE by Implementation：https://towardsdatascience.com/understanding-t-sne-by-implementing-2baf3a987ab3/
[8] Lundberg S. M., Lee S.-I. (2017) A unified approach to interpreting model predictions. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768-4777
比戸将平、馬場雪乃、里洋平、データサイエンティスト養成読本　機械学習入門編、技術評論社
高橋将宜、渡辺美智子、欠測データ処理：R による単一代入法と多重代入法（統計学 OnePoint 5）、共立出版
Spiess A. N., Neumeyer N. (2010) An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a monte carlo approach. BMC Pharmacol. 10, 6
金子弘昌、化学のためのPython によるデータ解析・機械学習入門、オーム社
川野秀一、松井秀俊、廣瀬慧、スパース推定法による統計モデリング（統計学 One Point 6）、共立出版
https://scikit-learn.org/stable/
下川敏雄、杉本知之、後藤昌司、樹木構造接近法（Rで学ぶデータサイエンス）、共立出版
森下光之助、機械学習を解釈する技術～予測力と説明力を両立する実践テクニック、技術評論社
金子弘昌、化学・化学工学のための実践データサイエンス ―Python によるデータ解析・機械学習―、朝倉書店
梶野洸、機械学習による分子最適化: 数理と実装、オーム社
後藤俊、荒川正幹、船津公人（2009）ポリマー設計のための物性推算法と逆解析手法の開発．Journal of Computer Aided Chemistry 10: 37
Gupta A., Müller A. T., Huisman B. J. H. et al (2018) Generative recurrent networks for de novo drug design. Mol. Inform. 37(1-2):1700111
Ishida S., Aasawat T., Sumita M. et al (2023) ChemTSv2: functional molecular design using de novo molecule generator. WIREs Comput. Mol. Sci. 13(6), e1680
Merk D., Friedrich L., Grisoni F. et al, (2018) De novo design of bioactive small molecules by artificial intelligence, Mol. Inform. 37:1700153
Brown N., Fiscato M., Segler M. H. S. et al, (2019) GuacaMol: benchmarking models for de novo molecular design, J. Chem Inf. Model. 59:1096-1108
Krenn M., Häse F., Nigam A. K. et al, (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn.: Sci. Technol. 1, 045024
Kochanski G., Golovin D., Karro J. et al (2017) Baysean optimization for a better dessert. Proceedings of the 2017 NIPS Workshop on Bayesian Optimization
永田靖、入門実験計画法、日科技連出版社
金子弘昌、Pythonで学ぶ実験計画法入門、講談社
Kondo M., Sugizaki A., Khalid M. I. et al (2021) Energy-, time-, and labor-saving synthesis of α-ketiminophosphonates: machine-learning-assisted simultaneous multiparameter screening for electrochemical oxidation. Green Chem. 23(16): 5823
Saito Y., Oikawa M., Nakazawa H. et al (2018) Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins. ACS Synth. Biol. 7:2014-2022
今村秀明、松井孝太、ベイズ最適化 ―適応的実験計画の基礎と実践―、近代科学社
Kipf T. N., Welling M. (2017) Semi-supervised classification with graph convolutional networks. In 5th Int Conf Learn Represent
佐藤竜馬、グラフニューラルネットワーク（機械学習プロフェッショナルシリーズ）、講談社
ヤオマー（原著）、ジリアンタン（原著）、宮原太陽（翻訳）、中尾光孝（翻訳）、グラフ深層学習、プレアデス出版
村田剛志、グラフニューラルネットワーク: PyTorch による実装、オーム社
Ramsundar B., Eastman P., Walters P. et al (2019) Deep learning for the life sciences. O'Reilly Media
Cozac R., Hasic H., Choong J. J. et al (2025) kMoL: an open-source machine and federated learning library for drug discovery. J. Cheminform. 17:22.
Kojima R., Ishida S., Ohta M. et al (2020) kGCN: a graph-based deep learning framework for chemical structures. J. Cheminform. 12:32
Boby M. L., Fearon D., Ferla M. et al (2023) Open science discovery of potent noncovalent SARS-CoV-2 main protease inhibitors. Science 382:eabo7201
Bento A. P., Hersey A., Félix E. et al (2020) An open source chemical structure curation pipeline using RDKit. J. Cheminform. 12:51.
kMoL, Federated Learning, Report v2.1.：https://github.com/elix-tech/kmol/blob/main/docs/documentation.pdf
torch_geometric.nn.：https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html
Ioffe S., Szegedy C. (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In 32nd Int. Conf. Mach. Learn. 448–456
Ulyanov D., Vedaldi A., Lempitsky V. (2016) Instance normalization: the missing ingredient for fast stylization. arXiv:1607.08022
Cai T., Luo S., Xu K. et al (2021) GraphNorm: a principled approach to accelerating graph neural network training. In Proc. 38th Mach. Learn. Res. 139:1204–1215
Srivastava N., Hinton G., Krizhevsky A. et al (2014) Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15:1929–1958
Akiba T., Sano S., Yanase T. et al (2019) Optuna: a next-generation hyperparameter optimization framework. In Proc 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. 2623–2631
Sundararajan M., Taly A., Yan Q. (2017) Axiomatic attribution for deep networks. In 34th Int. Conf. Mach. Learn. 7:5109–5118

【口コミ】

※口コミはありません。

まるっと解説Python×ケモインフォマティクスデータ収集から予測・生成まで

【目次】

第１章 ケモインフォマティクスを始めるために

第２章 化合物の表記方法

第３章 化合物データベースを使う

第４章 化合物の類似性探索

第５章 記述子を使った特性予測

第６章 化合物の構造生成

第７章 最適な実験条件の探索

第８章 構造 （グラフ） を使った特性予測

第９章 今後の学習に向けて

【参考文献】

【口コミ】

まるっと解説
Python×ケモインフォマティクス
データ収集から予測・生成まで

第１章　ケモインフォマティクスを始めるために

第２章　化合物の表記方法

第３章　化合物データベースを使う

第４章　化合物の類似性探索

第５章　記述子を使った特性予測

第６章　化合物の構造生成

第７章　最適な実験条件の探索

第８章　構造（グラフ）を使った特性予測

第９章　今後の学習に向けて