Paper:《A Few Useful Things to Know About Machine Learning》翻译与解读(二)

简介: Paper:《A Few Useful Things to Know About Machine Learning》翻译与解读(二)

Intuition Fails in High Dimensions  高维直觉失败


After overfitting, the biggest problem  in machine learning is the curse of  dimensionality. This expression was  coined by Bellman in 1961 to refer  to the fact that many algorithms that  work fine in low dimensions become  intractable when the input is highdimensional.  But in machine learning  it refers to much more. Generalizing  correctly becomes exponentially  harder as the dimensionality (number  of features) of the examples grows, because  a fixed-size training set covers a  dwindling fraction of the input space.  Even with a moderate dimension of  100 and a huge training set of a trillion  examples, the latter covers only a fraction of about 10−18 of the input space.  This is what makes machine learning  both necessary and hard.  

More seriously, the similaritybased  reasoning that machine learning  algorithms depend on (explicitly  or implicitly) breaks down in high dimensions.  Consider a nearest neighbor  classifier with Hamming distance  as the similarity measure, and suppose  the class is just x1 ∧ x2. If there  are no other features, this is an easy  problem. But if there are 98 irrelevant  features x3,..., x100, the noise from  them completely swamps the signal in  x1 and x2, and nearest neighbor effectively  makes random predictions.  

Even more disturbing is that nearest  neighbor still has a problem even  if all 100 features are relevant! This  is because in high dimensions all  examples look alike. Suppose, for  instance, that examples are laid out  on a regular grid, and consider a test  example xt. If the grid is d-dimensional,  xt’s 2d nearest examples are  all at the same distance from it. So as  the dimensionality increases, more  and more examples become nearest  neighbors of xt, until the choice of  nearest neighbor (and therefore of  class) is effectively random.   经过过度拟合后,机器学习中最大的问题就是维度的诅咒。该表达式由Bellman于1961年创造,是指在输入为高维输入时许多在低维运行良好的算法变得棘手的事实。但是在机器学习中,它涉及的更多。随着示例维数(特征数量)的增长,正确地进行概括变得越来越困难,因为固定大小的训练集覆盖了输入空间的缩小部分。即使具有100的适度范围和数以万亿计的示例的庞大训练集,后者仅覆盖了约10-18的输入空间的一小部分。这就是使机器学习既必要又困难的原因。

更严重的是,机器学习算法所依赖的基于相似度的原因(明确或隐含地)在高维度上被分解。考虑具有汉明距离的最近邻居分类器作为相似性度量,并假设该类仅为x1∧x2。如果没有其他功能,这是一个简单的问题。但是,如果x3,...,x100有98个不相关的功能,则来自它们的噪声会完全淹没x1和x2中的信号,并且最近的邻居会有效地进行随机预测。

更令人不安的是,即使所有100个功能都相关,最近的邻居仍然有问题!这是因为在高维度上所有示例看起来都是相似的。例如,假设示例被放置在规则的网格上,并考虑一个测试示例xt。如果网格是d维的,则xt的2d最接近的示例都与网格距离相同。因此,随着维数的增加,越来越多的示例成为xt的最接近邻居,直到最近邻居(以及类别)的选择实际上是随机的。

This is only one instance of a more  general problem with high dimensions:  our intuitions, which come  from a three-dimensional world, often  do not apply in high-dimensional  ones. In high dimensions, most of the  mass of a multivariate Gaussian distribution  is not near the mean, but in  an increasingly distant “shell” around  it; and most of the volume of a highdimensional  orange is in the skin, not  the pulp. If a constant number of examples  is distributed uniformly in a  high-dimensional hypercube, beyond  some dimensionality most examples  are closer to a face of the hypercube  than to their nearest neighbor. And if  we approximate a hypersphere by inscribing  it in a hypercube, in high dimensions  almost all the volume of the  hypercube is outside the hypersphere.  This is bad news for machine learning,  where shapes of one type are often approximated  by shapes of another.  

Building a classifier in two or three  dimensions is easy; we can find a reasonable  frontier between examples  of different classes just by visual inspection. (It has even been said that if  people could see in high dimensions  machine learning would not be necessary.)  But in high dimensions it is difficult  to understand what is happening.  This in turn makes it difficult to  design a good classifier. Naively, one  might think that gathering more features  never hurts, since at worst they  provide no new information about the  class. But in fact their benefits may  be outweighed by the curse of dimensionality.

Fortunately, there is an effect that  partly counteracts the curse, which  might be called the “blessing of nonuniformity.”  In most applications  examples are not spread uniformly  throughout the instance space, but  are concentrated on or near a lowerdimensional  manifold. For example,  k-nearest neighbor works quite well  for handwritten digit recognition  even though images of digits have  one dimension per pixel, because the  space of digit images is much smaller  than the space of all possible images.  Learners can implicitly take advantage  of this lower effective dimension,  or algorithms for explicitly reducing  the dimensionality can be used (for  example, Tenenbaum22). 这只是一个更高维度的一般性问题的一个例子:我们的直觉来自三维世界,通常不适用于高维度的直觉。在高维中,多元高斯分布的大部分质量都不在均值附近,而是在其周围越来越远的``壳''中;高维橙的大部分体积在皮肤中,而不是果肉中。如果恒定数量的示例均匀分布在一个高维超立方体中,则除了某些维之外,大多数示例比其最近的邻居更靠近超立方体的一面。并且,如果我们通过将其记录在超立方体中来近似超球面,则在高维中,几乎所有超立方体的体积都在超球面之外。这对于机器学习来说是个坏消息,其中一种类型的形状通常被另一种形状的形状近似。

在两个或三个维度中建立分类器很容易;我们可以通过目视检查在不同类别的示例之间找到合理的边界。 (甚至有人说,如果人们可以在高维度上看到机器学习是没有必要的。)但是在高维度上,很难理解正在发生的事情。反过来,这使得设计好的分类器变得困难。天真的,一个人可能认为收集更多功能永远不会有害,因为在最坏的情况下,它们不提供有关该类的新信息。但是实际上,它们的好处可能会因维数的诅咒而被抵消。

幸运的是,有一种效果可以部分抵消这种诅咒,这种诅咒可能被称为“不均匀的祝福”。在大多数应用程序中,示例并非均匀分布在整个实例空间中,而是集中在低维流形上或附近。例如,即使数字图像每像素具有一维尺寸,k近邻也能很好地用于手写数字识别,因为数字图像的空间比所有可能图像的空间小得多。学习者可以隐式地利用此较低的有效维度,或者可以使用显式降低维度的算法(例如Tenenbaum22)。



Theoretical Guarantees  Are Not What They Seem 理论上的保证不是他们所看到的


One of the major  developments of  recent decades has  been the realization  that we can have  guarantees on the  results of induction,  particularly if we  are willing to settle  for probabilistic  guarantees.

近几十年来的主要发展之一是认识到我们可以对归纳结果进行保证,特别是如果我们愿意为概率保证定居的话。

Machine learning papers are full of  theoretical guarantees. The most common  type is a bound on the number of  examples needed to ensure good generalization.  What should you make of  these guarantees? First of all, it is remarkable  that they are even possible.  Induction is traditionally contrasted  with deduction: in deduction you can  guarantee that the conclusions are  correct; in induction all bets are off.  Or such was the conventional wisdom  for many centuries. One of the major  developments of recent decades has  been the realization that in fact we can  have guarantees on the results of induction,  particularly if we are willing  to settle for probabilistic guarantees.  

The basic argument is remarkably  simple.5   Let’s say a classifier is bad  if its true error rate is greater than ε.  Then the probability that a bad classifier  is consistent with n random, independent  training examples is less  than (1 − ε)  n  . Let b be the number of bad classifiers in the learner’s hypothesis  space H. The probability that at  least one of them is consistent is less  than b(1 − ε)  n  , by the union bound. Assuming  the learner always returns a  consistent classifier, the probability  that this classifier is bad is then less  than |H|(1 − ε)  n  , where we have used  the fact that b ≤ |H|. So if we want this  probability to be less than δ, it suffices  to make n > ln(δ/|H|)/ ln(1 − ε) ≥ 1/ε (ln  |H| + ln 1/δ).  

机器学习论文充满了理论上的保证。最常见的类型是确保良好泛化所需的示例数量的界限。这些保证应怎么做?首先,令人惊讶的是它们甚至是可能的。传统上将归纳法与推论进行对比:在推论中,您可以保证结论是正确的;归纳所有赌注都关闭了。或多个世纪以来的传统智慧就是如此。认识到实际上我们可以对归纳的结果提供保证,这是近几十年来的主要发展之一,特别是如果我们愿意解决概率保证的话。

基本参数非常简单.5如果分类器的真实错误率大于ε,则它是错误的。然后,不良分类器与n个随机,独立的训练示例一致的概率小于(1-ε)n。令b为学习者假设空间H中的不良分类器的数量。其中至少一个是一致的概率受联合界限的约束小于b(1-ε)n。假设学习者总是返回一致的分类器,则该分类器不良的可能性小于| H |(1 −ε)n,其中我们使用了b≤| H |的事实。因此,如果我们希望此概率小于δ,则足以使n> ln(δ/ | H |)/ ln(1-ε)≥1 /ε(ln | H | + ln 1 /δ)。

Unfortunately, guarantees of this  type have to be taken with a large grain  of salt. This is because the bounds obtained  in this way are usually extremely  loose. The wonderful feature of the  bound above is that the required number  of examples only grows logarithmically  with |H| and 1/δ. Unfortunately,  most interesting hypothesis spaces  are doubly exponential in the number  of features d, which still leaves us  needing a number of examples exponential  in d. For example, consider  the space of Boolean functions of d  Boolean variables. If there are e possible  different examples, there are  2e   possible different functions, so  since there are 2d   possible examples,  the total number of functions is 22d  .  And even for hypothesis spaces that  are “merely” exponential, the bound  is still very loose, because the union  bound is very pessimistic. For example,  if there are 100 Boolean features  and the hypothesis space is decision  trees with up to 10 levels, to guarantee  δ = ε = 1% in the bound above we need  half a million examples. But in practice  a small fraction of this suffices for  accurate learning.  

Further, we have to be careful  about what a bound like this means.  For instance, it does not say that, if  your learner returned a hypothesis  consistent with a particular training  set, then this hypothesis probably  generalizes well. What it says is that,  given a large enough training set, with  high probability your learner will either  return a hypothesis that generalizes  well or be unable to find a consistent  hypothesis. The bound also says  nothing about how to select a good  hypothesis space. It only tells us that,  if the hypothesis space contains the  true classifier, then the probability  that the learner outputs a bad classifier  decreases with training set size.If we shrink the hypothesis space, the  bound improves, but the chances that  it contains the true classifier shrink  also. (There are bounds for the case  where the true classifier is not in the  hypothesis space, but similar considerations  apply to them.)

Another common type of theoretical  guarantee is asymptotic: given infinite  data, the learner is guaranteed  to output the correct classifier. This  is reassuring, but it would be rash to  choose one learner over another because  of its asymptotic guarantees. In  practice, we are seldom in the asymptotic  regime (also known as “asymptopia”).  And, because of the bias-variance  trade-off I discussed earlier, if  learner A is better than learner B given  infinite data, B is often better than A  given finite data.  

The main role of theoretical guarantees  in machine learning is not as  a criterion for practical decisions,  but as a source of understanding and  driving force for algorithm design. In  this capacity, they are quite useful; indeed,  the close interplay of theory and  practice is one of the main reasons  machine learning has made so much  progress over the years. But caveat  emptor: learning is a complex phenomenon,  and just because a learner  has a theoretical justification and  works in practice does not mean the  former is the reason for the latter. 不幸的是,这种类型的保证必须与大颗粒的盐一起使用。这是因为以这种方式获得的边界通常非常松散。上面绑定的一个奇妙功能是,所需的示例数仅与| H |成对数增长。和1 /δ。不幸的是,最有趣的假设空间在特征d的数量上是双倍的,这仍然使我们在d中需要大量的指数实例。例如,考虑d布尔变量的布尔函数的空间。如果有可能的不同示例,则可能有2e种不同的功能,因此,因为有2d种可能的示例,所以功能总数为22d。即使对于“仅”指数空间的假设空间,界限仍然非常宽松,因为联合界限非常悲观。例如,如果有100个布尔特征,并且假设空间是具有最多10个级别的决策树,为保证δ=ε= 1%在上面的范围内,我们需要半百万个示例。但是实际上,一小部分就足以进行准确的学习。

此外,我们必须注意这种限制的含义。例如,它并不表示,如果您的学习者返回了与特定训练集一致的假设,那么该假设可能会很好地概括。它的意思是,给定足够大的训练集,您的学习者很有可能会返回一个可以很好地推广的假设,或者无法找到一致的假设。边界也没有说明如何选择一个好的假设空间。它只告诉我们,如果假设空间包含真实分类器,则学习者输出不良分类器的概率会随着训练集大小的减小而减少。如果我们缩小假设空间,边界会有所改善,但是包含真实分类器的机会会有所增加。分类器收缩也。 (对于真正的分类器不在假设空间中,但适用于它们的类似考虑因素有一定的局限性。)

理论保证的另一种常见类型是渐进的:给定无限数据,可以保证学习者输出正确的分类器。这令人放心,但是由于其渐近保证,选择一个学习者而不是另一个学习者会很轻率。在实践中,我们很少采用渐近体制(也称为“渐近”)。并且,由于我之前讨论过偏差偏差的折衷,如果给定无限数据,学习者A优于学习者B,那么在有限数据下,学习者B通常优于学习者B.

理论保证在机器学习中的主要作用不是作为实际决策的标准,而是作为算法设计的理解和推动力的来源。以这种身份,它们非常有用;确实,理论和实践之间的紧密相互作用是机器学习多年来取得如此巨大进步的主要原因之一。但是需要警告的是:学习者是一个复杂的现象,仅因为学习者具有理论上的依据并且在实践中起作用并不意味着前者是后者的原因。



Feature Engineering Is The Key  特征工程是关键


A dumb algorithm  with lots and lots  of data beats  a clever one  with modest  amounts of it. 具有大量数据的愚蠢算法击败了数量适中的聪明算法。

At the end of the day, some machine  learning projects succeed and some  fail. What makes the difference? Easily  the most important factor is the  features used. Learning is easy if you  have many independent features that  each correlate well with the class. On  the other hand, if the class is a very  complex function of the features, you  may not be able to learn it. Often, the  raw data is not in a form that is amenable  to learning, but you can construct  features from it that are. This  is typically where most of the effort in  a machine learning project goes. It is  often also one of the most interesting  parts, where intuition, creativity and  “black art” are as important as the  technical stuff.  

First-timers are often surprised by  how little time in a machine learning  project is spent actually doing machine learning. But it makes sense if  you consider how time-consuming it  is to gather data, integrate it, clean it  and preprocess it, and how much trial  and error can go into feature design.  Also, machine learning is not a oneshot  process of building a dataset and  running a learner, but rather an iterative  process of running the learner,  analyzing the results, modifying the  data and/or the learner, and repeating.  Learning is often the quickest  part of this, but that is because we  have already mastered it pretty well!  Feature engineering is more difficult  because it is domain-specific,  while learners can be largely general  purpose. However, there is no sharp  frontier between the two, and this is  another reason the most useful learners  are those that facilitate incorporating  knowledge.  

Of course, one of the holy grails  of machine learning is to automate  more and more of the feature engineering  process. One way this is often  done today is by automatically generating  large numbers of candidate features  and selecting the best by (say)  their information gain with respect  to the class. But bear in mind that  features that look irrelevant in isolation  may be relevant in combination.  For example, if the class is an XOR of  k input features, each of them by itself  carries no information about the  class. (If you want to annoy machine  learners, bring up XOR.) On the other  hand, running a learner with a very  large number of features to find out  which ones are useful in combination  may be too time-consuming, or cause  overfitting. So there is ultimately no  replacement for the smarts you put  into feature engineering.

最终,一些机器学习项目成功了而有些失败了。有什么区别?最重要的因素很容易就是所使用的功能。如果您具有许多与班级紧密相关的独立功能,则学习将很容易。另一方面,如果该类是功能的非常复杂的功能,则您可能无法学习它。通常,原始数据的形式不适合学习,但您可以从中构造特征。这通常是机器学习项目中大部分工作的去向。它通常也是最有趣的部分之一,直觉,创造力和“妖术”与技术同样重要。初学者通常会对机器学习项目中实际用于机器学习的时间很少感到惊讶。但是,如果您考虑收集数据,集成,清理和预处理数据要花多长时间,以及可以在功能设计中进行多少试验和错误,这是有道理的。此外,机器学习不是构建数据集和运行学习者的一站式过程,而是运行学习者,分析结果,修改数据和/或学习者并重复的迭代过程。学习通常是其中最快的部分,但这是因为我们已经很好地掌握了它!特征工程更加困难,因为它是特定于领域的,而学习者在很大程度上可能是通用的。但是,两者之间没有敏锐的疆界,这是最有用的学习者是那些有助于整合知识的学习者的另一个原因。当然,机器学习的圣地之一是使越来越多的特征工程过程自动化。今天通常这样做的一种方式是通过自动生成大量候选特征并通过(比如说)它们相对于类的信息增益来选择最佳特征。但是请记住,孤立地看起来无关紧要的功能可能会组合在一起使用。例如,如果类别是k个输入要素的XOR,则每个类别本身都不携带有关类别的信息。 (如果要惹恼机器学习者,请调出XOR。)另一方面,运行具有大量功能的学习器以找出哪些功能组合在一起可能会非常耗时,或导致过度拟合。因此,您投入功能工程的智能最终无法替代。


More Data Beats  a Cleverer Algorithm  智慧算法带来更多数据优势


Suppose you have constructed the  best set of features you can, but the  classifiers you receive are still not accurate  enough. What can you do now?  There are two main choices: design a  better learning algorithm, or gather  more data (more examples, and possibly  more raw features, subject to  the curse of dimensionality). Machine  learning researchers are mainly concerned  with the former, but pragmatically  the quickest path to success is often to just get more data. As a rule  of thumb, a dumb algorithm with lots  and lots of data beats a clever one with  modest amounts of it. (After all, machine  learning is all about letting data  do the heavy lifting.)  This does bring up another problem,  however: scalability. In most of  computer science, the two main limited  resources are time and memory.  In machine learning, there is a third  one: training data. Which one is the  bottleneck has changed from decade  to decade. In the 1980s it tended to  be data. Today it is often time. Enormous  mountains of data are available,  but there is not enough time  to process it, so it goes unused. This  leads to a paradox: even though in  principle more data means that more  complex classifiers can be learned, in  practice simpler classifiers wind up  being used, because complex ones  take too long to learn. Part of the answer  is to come up with fast ways to  learn complex classifiers, and indeed  there has been remarkable progress  in this direction (for example, Hulten  and Domingos11).  

Part of the reason using cleverer  algorithms has a smaller payoff than  you might expect is that, to a first approximation,  they all do the same.  This is surprising when you consider  representations as different as, say,  sets of rules and neural networks. But  in fact propositional rules are readily  encoded as neural networks, and similar  relationships hold between other  representations. All learners essentially  work by grouping nearby examples  into the same class; the key difference  is in the meaning of “nearby.”  With nonuniformly distributed data,  learners can produce widely different  frontiers while still making the same  predictions in the regions that matter  (those with a substantial number of  training examples, and therefore also  where most test examples are likely to  appear). This also helps explain why  powerful learners can be unstable but  still accurate. Figure 3 illustrates this  in 2D; the effect is much stronger in  high dimensions.  

假设您已经构建了最好的功能集,但是收到的分类器仍然不够准确。你现在可以做什么?有两个主要选择:设计更好的学习算法,或收集更多数据(更多示例,可能还有更多原始特征,这取决于维度的诅咒)。机器学习研究人员主要与前者有关,但务实地,成功的最快途径通常是获取更多数据。根据经验,具有大量数据的愚蠢算法要击败数量适中的聪明算法。 (毕竟,机器学习只不过是让数据繁重而已。)但这确实带来了另一个问题:可伸缩性。在大多数计算机科学中,两个主要的有限资源是时间和内存。在机器学习中,有三分之一是训练数据。哪个瓶颈已经从十年改变到了十年。在1980年代,它倾向于成为数据。今天是时候了。有大量的数据可用,但是没有足够的时间来处理它,因此它没有被使用。这导致了一个悖论:尽管在原则上更多的数据意味着可以学习更多的复杂分类器,但在实践中却使用了更简单的分类器,因为复杂的分类器学习时间太长。答案的一部分是想出一种快速的方法来学习复杂的分类器,实际上在这个方向上已经取得了显着的进展(例如Hulten和Domingos11)。

使用更聪明算法的部分原因是收益比您预期的要小,这是一个近似值。当您认为表示形式与规则集和神经网络不同时,这令人惊讶。但是实际上命题规则很容易被编码为神经网络,并且其他表示之间也存在类似的关系。本质上,所有学习者都通过将附近的示例分组到同一个班级中来工作;关键区别在于“附近”的含义。使用非均匀分布的数据,学习者可以产生非常不同的边界,同时仍可以在重要区域做出相同的预测(那些区域具有大量的训练示例,因此也有可能出现大多数测试示例)。这也有助于解释为什么强大的学习者可能不稳定但仍然准确。图3以2D形式说明了这一点;在高尺寸时效果更强。

As a rule, it pays to try the simplest  learners first (for example, naïve Bayes  before logistic regression, k-nearest  neighbor before support vector machines).  More sophisticated learn ers are seductive, but they are usually  harder to use, because they have more  knobs you need to turn to get good results,  and because their internals are  more opaque.  

Learners can be divided into two  major types: those whose representation  has a fixed size, like linear classifiers,  and those whose representation  can grow with the data, like decision  trees. (The latter are sometimes called  nonparametric learners, but this is  somewhat unfortunate, since they  usually wind up learning many more  parameters than parametric ones.)  Fixed-size learners can only take advantage  of so much data. (Notice how  the accuracy of naive Bayes asymptotes  at around 70% in Figure 2.) Variablesize  learners can in principle learn any  function given sufficient data, but in  practice they may not, because of limitations  of the algorithm (for example,  greedy search falls into local optima)  or computational cost. Also, because  of the curse of dimensionality, no existing  amount of data may be enough.  For these reasons, clever algorithms—  those that make the most of the data  and computing resources available—  often pay off in the end, provided you  are willing to put in the effort. There  is no sharp frontier between designing  learners and learning classifiers;  rather, any given piece of knowledge  could be encoded in the learner or  learned from data. So machine learning  projects often wind up having a  significant component of learner design,  and practitioners need to have  some expertise in it.12

In the end, the biggest bottleneck is not data or CPU cycles, but human cycles. In research papers, learners  are typically compared on measures  of accuracy and computational cost.  But human effort saved and insight  gained, although harder to measure,  are often more important. This favors  learners that produce human-understandable  output (for example, rule  sets). And the organizations that make  the most of machine learning are  those that have in place an infrastructure  that makes experimenting with  many different learners, data sources,  and learning problems easy and efficient,  and where there is a close collaboration  between machine learning  experts and application domain ones. 通常,首先尝试最简单的学习者是值得的(例如,逻辑回归之前的朴素贝叶斯,支持向量机之前的k近邻)。经验丰富的学习者很诱人,但它们通常更难使用,因为它们具有更多的旋钮,您需要转向以获得良好的效果,并且它们的内部更加不透明。

学习者可以分为两种主要类型:那些具有固定大小的表示形式(如线性分类器)和那些随着数据增长的表示形式(如决策树)。 (后者有时被称为非参数学习者,但这有点不幸,因为他们通常要比参数学习更多的参数。)固定大小的学习者只能利用这么多数据。 (请注意,图2中朴素的贝叶斯渐近线的准确度约为70%)。可变大小的学习者原则上可以在给定足够数据的情况下学习任何函数,但由于算法的限制,实际上它们可能无法学习任何函数(例如,贪婪搜索下降转化为局部最优值)或计算成本。同样,由于维数的诅咒,现有的数据量可能不足。由于这些原因,只要您愿意付出努力,聪明的算法-那些可以充分利用可用数据和计算资源的算法通常会最终获得回报。设计学习者和学习分类器之间没有前沿的界限;相反,任何给定的知识都可以在学习者中进行编码或从数据中学习。因此,机器学习项目通常会包含学习者设计的重要组成部分,并且从业者需要在其中拥有一些专业知识.12

最后,最大的瓶颈不是数据或CPU周期,而是人员周期。在研究论文中,通常会比较学习者的准确性和计算成本。但是,尽管难以衡量,但节省了人力并获得见识通常更重要。这有利于产生人类可理解的输出(例如规则集)的学习者。充分利用机器学习的组织是那些拥有适当基础设施的组织,这些基础设施使对许多不同的学习者,数据源和学习问题的实验变得容易而高效,并且机器学习专家和应用程序领域之间存在密切的协作那些。




相关文章
|
7月前
|
机器学习/深度学习 自然语言处理 对象存储
[wordpiece]论文分析:Google’s Neural Machine Translation System
[wordpiece]论文分析:Google’s Neural Machine Translation System
90 1
|
机器学习/深度学习 存储 传感器
Automated defect inspection system for metal surfaces based on deep learning and data augmentation
简述:卷积变分自动编码器(CVAE)生成特定的图像,再使用基于深度CNN的缺陷分类算法进行分类。在生成足够的数据来训练基于深度学习的分类模型之后,使用生成的数据来训练分类模型。
157 0
《NATURAL LANGUAGE UNDERSTANDING WITH MACHINE ANNOTATORS & DEEP LEARNED ONTOLOGIES AT SCALE》电子版地址
NATURAL LANGUAGE UNDERSTANDING WITH MACHINE ANNOTATORS & DEEP LEARNED ONTOLOGIES AT SCALE
99 0
《NATURAL LANGUAGE UNDERSTANDING WITH MACHINE ANNOTATORS & DEEP LEARNED ONTOLOGIES AT SCALE》电子版地址
|
机器学习/深度学习 移动开发 自然语言处理
Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读
Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读
Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读
|
机器学习/深度学习 算法 搜索推荐
Paper:《A Few Useful Things to Know About Machine Learning—关于机器学习的一些有用的知识》翻译与解读
Paper:《A Few Useful Things to Know About Machine Learning—关于机器学习的一些有用的知识》翻译与解读
|
SQL 编译器 API
Efficiently Compiling Efficient Query Plans for Modern Hardware 论文解读
这应该是SQL查询编译的一篇经典文章了,作者是著名的Thomas Neumann,主要讲解了TUM的HyPer数据库中对于CodeGen的应用。 在morsel-driven那篇paper 中,介绍了HyPer的整个执行框架,会以task为单位处理一个morsel的数据,而执行的处理逻辑(一个pipeline job)就被编译为一个函数。这篇paper则具体讲如何实现动态编译。
451 0
Efficiently Compiling Efficient Query Plans for Modern Hardware 论文解读
|
机器学习/深度学习 算法 搜索推荐
Paper:《A Few Useful Things to Know About Machine Learning》翻译与解读(三)
Paper:《A Few Useful Things to Know About Machine Learning》翻译与解读(三)
|
机器学习/深度学习 算法 搜索推荐
Paper:《A Few Useful Things to Know About Machine Learning》翻译与解读(一)
Paper:《A Few Useful Things to Know About Machine Learning》翻译与解读(一)
|
Linux Windows
6 Effective Methods to Learn New Technologies Faster
Technology is always evolving, and developers need to learn new products and languages faster to cope with these changes.
6288 0
6 Effective Methods to Learn New Technologies Faster