Deep Learning

书名:Deep LearningAdaptiveComputationandMachineLearningseries
作者:IanGoodfellow/YoshuaBengio/AaronCourville
译者:
ISBN:9780262035613
出版社:TheMITPress
出版时间:2016-11-11
格式:epub/mobi/azw3/pdf
页数:800
豆瓣评分: 9.2

书籍简介:

"Written by three experts in the field, Deep Learning is the only comprehensive book on the subject." — Elon Musk, co-chair of OpenAI; co-founder and CEO of Tesla and SpaceX Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

作者简介:

Ian Goodfellow is Research Scientist at OpenAI. Yoshua Bengio is Professor of Computer Science at the Université de Montréal. Aaron Courville is Assistant Professor of Computer Science at the Université de Montréal.

书友短评:

@ 3点一直线 这书不错,前面快 工具书,不适合新人自学。part1基础知识可以快速阅读过一遍,part2和part3不适合新手直接看。斯坦福cs231n是最好的深度学习资料。 @ 阿伯次得分高 a great list of papers @ Marvin 读的中文版:https://github.com/exacity/deeplearningbook-chinese 第三部分还没读下去,深觉数学不够 含金量台高,7,8,11三章真是调参的人森经验了 @ 浮生会少 很好的一本书,同时具备细节和high level的描述。cover了各个程度对deep learning的理解需求 @ 这是坠吼的 深度学习圣经 @ Ru1dongzZ 花书,经典中的经典,一出版就买了。到现在还没读完,也没什么机会读了 @ Rex Textbook used in the course "Introduction to Machine Learning" at CMU in the forth year of PhD. Quit studying ML for several months, might come back in the future. @ 焚琴煮茶 算法层面毋庸置疑的好教材,常看常新 @ C2H5OHlife the bible

书籍目录

Acknowledgments xv
Notation xix
1 Introduction 1
1.1 Who Should Read This Book? 8
1.2 Historical Trend sin Deep Learning 12
I Applied Math and Machine Learning Basics 27
2 Linear Algebra 29
2.1 Scalars, Vectors, Matrices and Tensors 29
2.2 Multiplying Matricesand Vectors 32
2.3 Identity and Inverse Matrices 34
2.4 Linear Dependence and Span 35
2.5 Norms 36
2.6 Special Kinds of Matrices and Vectors 38
2.7 Eigendecomposition 39
2.8 Singular Value Decomposition 42
2.9 The Moore-Penrose Pseudoinverse 43
2.10 The Trace Operator 44
2.11 The Determinant 45
2.12 Example: Principal Components Analysis 45
3 Probability and Information Theory 51
3.1 Why Probability? 52
3.2 Random Variables 54
3.3 Probability Distributions 54
3.4 Marginal Probability 56
3.5 ConditionalProbability 57
3.6 The Chain Rule of Conditional Probabilities 57
3.7 Independence and Conditional Independence 58
3.8 Expectation, Varianceand Covariance 58
3.9 Common Probability Distributions 60
3.10 UsefulPropertiesofCommonFunctions 65
3.11 Bayes’Rule 68
3.12 Technical Details of Continuous Variables 68
3.13 Information Theory 70
3.14 Structured Probabilistic Models 74
4 Numerical Computation 77
4.1 Overflow and Underflow 77
4.2 Poor Conditioning 79
4.3 Gradient-Based Optimization 79
4.4 Constrained Optimization 89
4.5 Example: Linear Least Squares 92
5 Machine Learning Basics 95
5.1 Learning Algorithms 96
5.2 Capacity, Overfitting and Underfitting 107
5.3 Hyperparameters and Validation Sets 117
5.4 Estimators, Bias and Variance 119
5.5 Maximum Likelihood Estimation 128
5.6 BayesianStatistics132
5.7 Supervised Learning Algorithms 136
5.8 Unsupervised Learning Algorithms142
5.9 StochasticGradientDescent 147
5.10 Building a Machine Learning Algorithm 149
5.11 Challenges Motivating Deep Learning 151
II Deep Networks: Modern Practices 161
6 Deep Feedforward Networks 163
6.1 Example:Learning XOR 166
6.2 Gradient-Based Learning 171
6.3 Hidden Units 185
6.4 Architecture Design 191
6.5 Back-Propagation and Other Dierentiation Algorithms 197
6.6 Historical Notes 217
7 Regularization for Deep Learning 221
7.1 Parameter Norm Penalties 223
7.2 Norm Penalties as Constrained Optimization 230
7.3 Regularization and Under-Constrained Problems 232
7.4 Dataset Augmentation 233
7.5 Noise Robustness 235
7.6 Semi-Supervised Learning236
7.7 Multitask Learning 237
7.8 Early Stopping 239
7.9 Parameter Tying and Parameter Sharing 246
7.10 Sparse Representations 247
7.11 Bagging and Other Ensemble Methods 249
7.12 Dropout 251
7.13 Adversarial Training261
7.14 Tangent Distance, Tangent Prop and Manifold Tangent Classiffer 263
8 Optimization for Training DeepModels 267
8.1 How Learning Differs from Pure Optimization 268
8.2 Challenges in Neural Network Optimization 275
8.3 Basic Algorithms 286
8.4 Parameter Initialization Strategies 292
8.5 Algorithms with Adaptive Learning Rates 298
8.6 Approximate Second-Order Methods 302
8.7 Optimization Strategies and Meta-Algorithms 309
9 Convolutional Networks 321
9.1 The Convolution Operation 322
9.2 Motivation 324
9.3 Pooling 330
9.4 Convolution and Pooling as an Infinitely Strong Prior 334
9.5 Variants of the Basic Convolution Function 337
9.6 Structured Outputs 347
9.7 Data Types 348
9.8 Efficient Convolution Algorithms 350
9.9 Random or Unsupervised Features 351
9.10 The Neuroscientific Basis for Convolutional Networks 353
9.11 Convolutional Networks and the History of Deep Learning 359
10 Sequence Modeling: Recurrent and Recursive Nets 363
10.1 Unfolding Computational Graphs 365
10.2 Recurrent Neural Networks 368
10.3 Bidirectional RNNs 383
10.4 Encoder-Decoder Sequence-to-Sequence Architectures 385
10.5 Deep Recurrent Networks 387
10.6 Recursive Neural Networks 388
10.7 The Challenge of Long-Term Dependencies 390
10.8 Echo State Networks 392
10.9 Leaky Units and Other Strategies for Multiple Time Scales 395
10.10 The Long Short-Term Memory and Other Gated RNNs 397
10.11 Optimization for Long-Term Dependencies 401
10.12 Explicit Memory 405
11 Practical Methodology 409
11.1 Performance Metrics 410
11.2 DefaultBaselineModels 413
11.3 Determining Whether to Gather More Data 414
11.4 Selecting Hyperparameters 415
11.5 Debugging Strategies 424
11.6 Example: Multi-Digit Number Recognition 428
12 Applications 431
12.1 Large-Scale Deep Learning 431
12.2 Computer Vision.440
12.3 Speech Recognition 446
12.4 Natural Language Processing 448
12.5 Other Applications 465
III Deep Learning Research 475
13 Linear Factor Models 479
13.1 Probabilistic PCA and Factor Analysis 480
13.2 Independent Component Analysis (ICA) 481
13.3 Slow Feature Analysis.484
13.4 Sparse Coding 486
13.5 Manifold Interpretation of PCA 489
14 Autoencoders 493
14.1 Undercomplete Autoencoders 494
14.2 Regularized Autoencoders 495
14.3 Representational Power, Layer Size and Depth 499
14.4 Stochastic Encodersand Decoders 500
14.5 Denoising Autoencoders501
14.6 Learning Manifolds with Autoencoders 506
14.7 Contractive Autoencoders 510
14.8 Predictive Sparse Decomposition 514
14.9 Applications of Autoencoders515
15 Representation Learning 517
15.1 Greedy Layer-Wise Unsupervised Pretraining 519
15.2 Transfer Learning and Domain Adaptation 526
15.3 Semi-Supervised Disentangling of Causal Factors 532
15.4 Distributed Representation 536
15.5 Exponential Gains from Depth 543
15.6 Providing Clues to Discover Underlying Causes 544
16 Structured Probabilistic Models for Deep Learning 549
16.1 The Challenge of Unstructured Modeling 550
16.2 Using Graphs to Describe Model Structure 554
16.3 Sampling from Graphical Models 570
16.4 Advantages of Structured Modeling 572
16.5 Learning about Dependencies 572
16.6 Inferenceand Approximate Inference 573
16.7 The Deep Learning Approach to Structured Probabilistic Models 575
17 Monte Carlo Methods 581
17.1 Sampling and Monte Carlo Methods 581
17.2 Importance Sampling 583
17.3 Markov Chain Monte Carlo Methods 586
17.4 Gibbs Sampling 590
17.5 The Challenge of Mixing between Separated Modes 591
18 Confronting the Partition Function 597
18.1 The Log-Likelihood Gradient 598
18.2 Stochastic Maximum Likelihood and Contrastive Divergence 599
18.3 Pseudolikelihood 607
18.4 Score Matching and Ratio Matching 609
18.5 DenoisingScore Matching 611
18.6 Noise-Contrastive Estimation 612
18.7 Estimatingthe Partition Function 614
19 Approximate Inference 623
19.1 Inferenceas Optimization 624
19.2 Expectation Maximization 626
19.3 MAP Inferenceand Sparse Coding 627
19.4 Variational Inferenceand Learning 629
19.5 Learned Approximate Inference 642
20 Deep Generative Models 645
20.1 Boltzmann Machines 645
20.2 Restricted Boltzmann Machines 647
20.3 Deep Belief Networks 651
20.4 Deep Boltzmann Machines 654
20.5 Boltzmann Machines for Real-Valued Data 667
20.6 Convolutional Boltzmann Machines 673
20.7 Boltzmann Machines for Structured or Sequential Outputs 675
20.8 Other Boltzmann Machines.677
20.9 Back-Propagation through Random Operations 678
20.10 Directed Generative Nets 682
20.11 Drawing Samples from Autoencoders 701
20.12 Generative Stochastic Networks 704
20.13 Other Generation Schemes 706
20.14 Evaluating Generative Models 707
20.15 Conclusion 710
Bibliography 711
Index 767
· · · · · ·

  • Noise injection also works when the noise is applied to the hidden units, which can be seen as doing dataset augmentation at multiple levels of abstraction. Poole et al. (2014) recently showed that this approach can be highly effective provided that the magnitude of the noise is carefully tuned. Dropout, a powerful regularization strategy that will be described in section 7.12, can be seen as a process of constructing new inputs by multiplying by noise.
    —— 引自第234页
  • 层次:让计算机从经验中学习,并根据层次化的概念体系来理解世界,而每个概念则通过与某些相对简单的概念之间的关系来定义。这些机器学习算法的性能在很大程度上依赖给定数据的表示。然而,对于许多人物来说,我们很难知道应该提取哪些特征。解决这个问题的途径之一是使用机器学习来发掘表示本身,二不仅仅把表示映射到输出。这种方法我们称之为表示学习。深度学习通过其他较简单的表示来表达复杂表示,解决了表示学习中的核心问题。就像计算机程序的长度不存在单一的正确值一样,架构的深度也不存在单一的正确值。控制论Cybernetics,联结主义Connnectionism,深度学习Deep Learning。先行模型无法学习异或XOR函数。这导致了神经网络热潮的第一次大衰退。联结主义的中心思想史,当网络将大量简单的计算单元连接在一起时可以实现智能行为。在那个时候,人们普遍认为深度网络是难以训练的。现在我们知道,20世纪80年代就存在的算法能工做得非常好,但是知道2006年前后都没有体现出来。这可能仅仅由于其计算代价太高,而以当时可用的硬件难以进行足够的试验。最重要的进展是,现在我们有了这些算法得意成功训练所需的资源。监督深度学习算法在每类给定约5000个标注样本情况下一般讲达到可以接受的性能,当至少有1000万个标注样本的数据集用于训练时,它将达到或超过人类表现。几十年来,我们的机器学习模型中每个神经元的连接数量已经与哺乳动物的大脑在同一数量级上。神经图灵机,能学习读取存储单元和向存储单元写入任意内容。这种自我变成技术正处于起步阶段,但原则上未来可以适用于几乎所有的任务。在强化学习中,如DeepMind的AlphaGo,一个自主的智能体必须在没有人类操作者指导的情况下,通过试错来学习执行任务。
    —— 引自第1页
  •   Adaptive Computation and Machine Learning(共28册),这套丛书还有《Machine Learning》《Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)》《Foundations of Machine Learning》《Probabilistic Machine Learning》《Introduction to Machine Learning》等。

    添加微信公众号:好书天下获取

    添加微信公众号:“好书天下”获取书籍好书天下 » Deep Learning
    分享到: 更多 (0)

    评论 抢沙发

    评论前必须登录!

     

    添加微信公众号:“好书天下”获取书籍

    添加微信公众号:“好书天下”获取书籍添加微信公众号:“好书天下”获取书籍