Posts tagged ‘education’
Lewis D. Eigen
Every day billions, if not trillions, of decisions are made by government, corporations, and other organizations that affect the lives of most Americans. Who gets a loan? Who gets a job? Who is called and pitched by a telephone agent. Who gets what treatment for a medical condition? Whose car is stopped by the police, who is given a security clearance, what ads you see on the Internet? Who is placed on the “No Fly” List, and myriad others. This extensive decision making is in the nature of any sophisticated society and economy. However, tomorrow will be a little different from a decade ago in one interesting respect: millions of those decisions will be made by computers and not people. These millions of decisions will not be made by computers following rules programmed by humans; the computers will create their own rules. “Millions,” however, is still a small percentage of all of society’s decisions. The proportion of decisions that are made by computers is increasing extremely rapidly and dramatically. Machine Learning has arrived!
Why is this happening now, how does it happen, is this good for society and its individuals, is it consistent with democracy?
The last 2 questions will not be able to be answered for several years, but the first two can be now. With those answers we can start a process for inquiry into the last two. We can either let this field evolve on its own, and a few years from today complain that yet another technology is out of hand and taking over our lives; or we can be a part of its development and shape its evolution to optimize its utility and minimize the potential damage. This article offers a first step in the process at a time when the economic vested interests are not frozen and almost all of the current players are amenable to public participation—indeed many of them seek it.
Archeology & Science Fiction
Some of the ancient Egyptian edifices, Stonehenge in England and other ancient structures were actually crude astronomical calculators by which seasons, floods, and the quality of harvests might be predicted. These were not very accurate by modern standards, but they were accurate enough to give those who understood them enormous power within their societies.
Science fiction authors for 150 years have written about robots and computers that could make high quality decisions in all sorts of areas. In the most dramatic of these, the computers not only can calculate and make decisions faster than humans but make better quality decisions. During WWII the first automated devices were designed by British mathematician Alan Turing to perform tasks of which humans were incapable in a professional lifetime—cracking the “unbreakable” German Enigma Cypher system. The field of artificial intelligence (AI) was born and although it has advanced steadily in the almost 80 years of its development, it has not reached the hopes and theoretical prowess that information scientists and others had predicted.
There were some dramatic achievements in artificial intelligence decision-making. Many banks created programs that would examine the file of a loan applicant and make a “recommendation” which was almost always followed. In another example, computers were programmed so that based on a medical history questionnaire, the computer would predict, “advise a physician,” as to what diseases the patient would most likely have. The results were very impressive in that the results were obtained with no clinical data, like blood pressure or cholesterol level, in the input set.
There was a huge amount of hype and capital investment in people and companies who were likely to make the great leap forward in artificial intelligence. No such breakthroughs came about; useful tools were developed which would normally have been considered a successful advance in science. In juxtaposition to the degree of hype and the huge investments made, there was general disappointment and AI became the first, large, high technology area that neither made major significant scientific breakthroughs nor made much if any money for investors. Along the way different approaches like “neural networking” and “expert systems” were hyped, worked with and essentially relegated to narrow utility. In particular, most people perceive these terms as successful techniques of artificial intelligence. There were many who had reasons to hype them, but few had any vested interest in disseminating the fact that there were, so far, of very limited use
Many great mathematical, scientific, and technological innovations are the product of an individual with a great insight or experiment. Others are the result of many small increments reaching a critical mass and contributed to by many. Machine learning is of the latter type. No one really knows who thought of the major ideas first. However, the key difference between machine learning and the earlier, less successful AI efforts was in who made the rules that the computer followed. In the expert systems and other earlier efforts, humans made and programmed the decision rules, and the computer followed them. With machine learning, the computer makes its own rules. The mathematicians have even come up with the term “unsupervised machine learning” to describe the system where the computer calculates and creates the rules by which it will then make the decisions. There is also “supervised machine learning” where humans either control all of the factors of decision making or a substantial proportion of them. This article is concerned with the “unsupervised” mode, because that is where the recent, astounding breakthroughs have been made.
In 1959, Arthur Samuel defined machine learning as a “Field of study that gives computers the ability to learn without being explicitly programmed.” He was thinking about unsupervised machine learning.
Machine Learning & Universities
At the University of Washington, Professor Carlos Guestrin teaches a set of courses in machine learning. Typically many of the major digital breakthroughs have their roots in university faculties and advanced students’ exploratory and development activities. Google’s innovative search technology was initially developed by the two Google founders when they were graduate students at Stanford. Facebook came from Harvard students. Almost always the further development of the digital breakthrough takes place in corporations – large, medium, and small. When usable products, tools or techniques become sufficiently popular and successful, the universities again step in and add courses in the new technology. One adage amongst the digerati is “By the time universities teach courses in a technology, the technology is old hat and soon will be obsolete and eclipsed by something newer and better”. The technology which may have started in the university, often does not return until almost a decade later. There are thousands of new technologies and universities cannot possibly have courses in even 10 percent of the many thousands. By the time a technology is generally acknowledged to be a winner, it is no longer at the cutting edge. A number of years ago, I advised the software engineers in my organization that, “if you are working with a computer technology that you fully understand, you can be certain that it is obsolete.”
However, machine learning has developed very differently. Although most Americans have not yet heard of machine learning and there are no software products in machine learning that are selling in the millions, virtually every major university in America has 1 or more courses in the subject and a few, like Princeton, have developed an entire center: The Center for Statistics and Machine Learning (CSML) which was founded in July 2014. This brings us back to Professor Carlos Guestrin at the University of Washington. In academia, one of the most prestigious appointments that a Professor can be named to is an endowed chair. The university has received the funds to hire a very distinguished scholar and often to provide equipment and funds for assistance in his/her research. In almost all cases the “Chair” is named for the funder. Professor Guestrin is the “Amazon Professor of Machine Learning”. Amazon happens to be one of the largest and most successful users of machine learning in the world. In addition to supporting their state university, Amazon is interested in even further pushing back the frontiers of knowledge of this field.
Machine learning research and development is ongoing at universities, corporations of all sizes, government agencies, think tanks and even unaffiliated individuals are making major contributions. One reason that the field has attracted so many enthusiasts have been the early results. Another, and perhaps more important, is that research and development in machine learning is very inexpensive. All it takes is a $3000 computer, a data set, and a mathematician or a computer software scientist/engineer who is interested. Many if not most of the existing machine learning techniques have been developed by people with no special funding, budgets or grants. It is an ideal field of inquiry for doctoral candidates who have scant resources.
Machine Learning Today
What kinds of problems can be tackled and what decisions can be made with machine learning today? One of the best examples of a machine learning problem is given by IBM Scientist Ryan Anderson. Instead of using a real example he uses a mythical one to which most of us can relate—the Harry Potter story. “As any Potterhead can tell you, the house that the Sorting Hat chooses for each of the students determines their entire experience at Hogwarts – J.K. Rowling’s legendary School of Witchcraft and Wizardry. For those not familiar with Harry Potter, the students are each placed into one of the school’s four houses – Gryffindor, Ravenclaw, Hufflepuff, and Slytherin – which is a decision determined by the Sorting Hat. Depending on the character traits – such as bold and brave for Gryffindor – the hat places the wizard or witch into whichever house it determines he or she belongs. Even for Harry Potter, the Sorting Hat had to contemplate about which house he belonged to. Harry had ‘plenty of courage’, ‘not a bad mind,’ and ‘a thirst to prove [him] self,’ all characteristics of Gryffindor and Slytherin, and so the Sorting Hat had a difficult choice to make – the sorting hat ultimately decided on the most confident option.”
The “big data” set consists of the personality and character traits that previous students possessed and which, presumably, has been collected over the years in conjunction with an assessment of how well the assignment to the particular house worked out. That is the “training set” that was used by the sorting hat to run its machine learning program. The sorting hat does double duty in that it has the ability, when placed over the head of each student, to discern the particular set of characteristics possessed by that student. Then the soring hat makes a prediction as to which house would constitute the best fit for that student.
In the sorting hat example, the object was to take all instances and predict which of 4 alternatives would make the best fit, but it could be 2—“give the loan or do not” or 3—“treat with surgery, treat with radiation, treat with chemotherapy” or any number of classes. The most general case where machine learning excels is to assign a score to each individual or entity. The credit score is our most familiar example. A sales manager would generally love to be able to have his computer assign a score to each potential customer as to the likelihood of making a reorder in that month so he/she can tell the sales staff to concentrate on the most likely sales prospects.
Today, machine learning is being used in a wide variety of decision making settings. The number and kind of decisions being made by machine is increasing daily. Ever since the 1990’s there have been scientific journal articles published about machine learning aspects in some of the Mathematical and Computer journals. Today there are many and there are currently even two scholarly, scientific journals devoted specifically to machine learning.
There are already over 15 Machine Learning Commercial Software Programs available and almost twice that number of “open source” programs produced by scholars and profit and non-profit institutions that people are welcome to use for free. IBM, for example, has committed its machine learning platform, SystemML, to the open source community. SystemML enables developers who don’t have expertise in machine learning to embed it in their applications once and use it in industry-or other application specific scenarios on a wide variety of computing platforms, from mainframes to smartphones.
Government is also researching and using machine learning for an increasing number of decisions. It is hard to obtain official specific confirmation in many cases on the grounds of secrecy and national security. As we shall see later there is another reason that many government agencies will neither confirm nor deny that they are actually using machine learning to conduct government business.
The growth of the use of machine learning has recently prompted the prestigious British periodical, The Economist, to ask, “How has artificial intelligence, associated with hubris and disappointment since its earliest days, suddenly become the hottest field in technology?”
Although there is currently explosive growth of machine learning, the science is by no means perfect. Not all the people who are using the technology in their own organizations are aware of the imperfections, and very few of the people whose lives are being affected by machine learning decisions know or understand them and almost all do not realize that it is computers that are making the decisions that affect them. However, the public and private sector mathematicians, computer scientists and sophisticated executives using machine learning understand them very well. They are hard at work devising ways to ameliorate, if not eliminate, the potential negative and undesirable consequence of these shortcomings. Most of these people realize that it will become ever more important for the public to understand these, lest there be a Luddite, anti-scientific movement for the banning of all such techniques which will include the many wonderful, positive applications as well as the problematic ones.
The Five Potential Machine Learning Problems
Machine learning, as exciting and useful as it is, like all other profound technologies, past and present, not only provides benefits to society, but has potential weaknesses above and beyond the disruption that significant change always brings. Few of these are inevitable and most depend on how the technology is incorporated into society—the legal, ethical and regulatory framework within which the technology is used. In most cases, our ethical, legal and regulatory mechanisms have lagged far behind technological development, often causing pain that could have been otherwise avoided and often diminishing and/or delaying many of the benefits.
Machine learning is still relatively new, and there is still time for us to influence the rapid and profound growth that while significant now, will become ubiquitous—affecting many aspects of most of our lives in just a few years.
Accuracy is Better for Groups Than for Individuals
Most of us think of accuracy monolithically. A technique of prediction is either accurate or it is not. There are some areas where we are more confident of our predictive ability with individuals than with groups. Many of us are fairly confident that we can predict whether each of our relatives or friends are likely to vote for or against a current political candidate. However, we are much less confident of our ability to predict how the entire electorate is going to vote. There are a few situations which are just the opposite—we can make better predictions about groups than individuals. Consider trying to predict whether a person drawing a card from a deck will choose a red or a black card. If he does this 10 times, we have a hard time predicting whether he will get 5 black and 5 red, 6 and 4, 7 and 3, etc. We can estimate probabilities but are not likely to be accurate very often. Our average error will be significant. But if we had 1 million people each draw 10 cards, we can predict with very high precision that ½ of all the 10 million cards will be red. Our average error will be–percentage wise–very small. That is the nature of the branch of mathematics called Probability and Statistics. Small samples produce larger potential errors, and we can have less confidence in the results. The situations where group predictions are more accurate than individual predictions are not very familiar to most of us.
In the physics of mechanics, if we drop two different weights from the same height at the same time, we expect that they will both land at the same time. And that is true whether we try this once, twice or a million times. In hard sciences like physics or chemistry, most of us expect that the scientific laws that govern are operable regardless of the number of observations. However, and this will come as a surprise to most readers, science is learning that a growing proportion of this inevitability and predictability of physical phenomena works more and more like playing cards and not inevitably as we learned in studying the physics of Sir Isaac Newton and Albert Einstein’s theory of relativity.
One of the most dramatic and early demonstrations of this occurred in 1922 when a student of Albert Einstein, Otto Stern, and his colleague Walther Gerlach, together conducted an experiment that completely changed the way physicists have viewed the world. It took a number of years for many to even accept the possibility of their results with Einstein one of those who for considerable time could not come to grips with the implications. But it was the beginning of the modern physics we now call quantum mechanics.
Stern wanted to shoot silver atoms through a special kind of magnetic field. Gerlach had fashioned a furnace that emitted silver atoms which passed through a slit to form a beam. The silver atoms would be directed at a type of photographic detector plate. Stern wanted to see the pattern of travel the silver atoms would go through as they passed through the magnetic field. Stern wanted a “non-homogenous” field–one where the magnetic field was not uniform throughout. If the magnetic field would have no effect on the silver atoms, the atoms would strike right in the middle of the plate. If the field deflected the atoms, they would hit somewhere else on the plate and the researchers could see where and measure how far they had been deflected. From this data, the forces and energy created by the field acting on the silver atoms could be measured. If the atoms all hit in the center, there would be no deflecting force, but Stern from previous research and some theoretical analysis was pretty sure that the field would deflect the atoms as they passed through the field—and deflect them in a remarkable way.
They turned on the apparatus. When Stern looked at the plate, he knew that they had demonstrated something that most people would not believe. What the plate showed was so astounding that it would excite the world of physics more than anything had since Einstein’s Theory of Relativity, even though it took a few more years to figure it all out.
The beam of silver atoms traveling at near relativistic speeds causes a small black circle to appear on the plate. The circle is created by thousands of individual atoms hitting the plate at that location. If the circle were in the center of the plate, no deflection of the atoms took place. There was no circle in the center. Deflection had taken place. However, what was visible on the plate was TWO circles—one deflected to the left, the other by a like distance, only it was to the right. The same source of silver atoms was firing thousands of silver atoms at the exact same angle, with the exact same force. They all passed through the exact same magnetic field. Yet half the silver atoms were deflected to the left and half to the right. (Actually the original experiment used up and down deflections but the right-left is easier to depict.) The result was exactly like the red and black playing cards. According to everything that was known from Newtonian mechanics, such an outcome was impossible. Even with the Theory of Relativity, there was no reasonable explanation for such an outcome. The same causes should produce the same results. If it does not, what happens to our sense of an ordered, consistent universe operating under immutable physical laws? Philosophically, what happens to the concept of personal responsibility if we can no longer be certain of the results of our actions? If half the time a driver turned the steering wheel right, the car went right, but the other half, at random, the car went left, what would our liability insurance system look like?
Einstein and the other doubters finally came along as it became clear not only that this happens in many situations with very small particles, but scientists now know why this happens. The Holy Grail of Theoretical Physics today is to develop a unified theory that explains both the Newton-Einstein world of large objects and the Quantum Mechanics world of small objects like molecules, atoms and quarks. Einstein was still trying to create such a unified theory on his deathbed in 1955.
What is important for our purposes is that Stern and Gerlach could predict with near absolute certainty what proportion of the silver atoms would go left (half) and what percentage would go right (half), but they were unable to predict for any particular atom which way it would go. Today we know that there are a myriad of physical phenomena where the same causes will not produce the same results. And we have realized that this is not only in physics and chemistry but also in biology which is controlled by the modern laws of physics and chemistry. The scientific monk Gregor Mendel in the middle of the 19th century showed that when peas with different traits, like wrinkled and smooth, are crossbred it is impossible to predict how any individual offspring will turn out, but it was possible to state that ¾ of the offspring would have one trait and ¼ would have the other. That principle holds with humans and many inherited traits such as Lou Gehrig’s disease where ¾ of the offspring of two afflicted parents would contract the disease but ¼ would not. The inability to tell in advance for any given pregnancy complicates the morality and ethics of the abortion issue immensely.
The underlying mathematics of machine learning involves probability, and the results are astoundingly good when applied to a large group, but very problematical when applied to an individual case. One classic political use of machine learning illustrates the problem well. Most political campaigns, if they have the resources, like to have their volunteers or paid staff go door to door visiting prospective voters to try and persuade them to vote for their candidate. In ancient times, 2000 from a machine learning point of view, the campaigns would target neighborhoods where the residents were NOT registrants of their party. The object of the door to door visit was to persuade residents who might not vote for your candidate to change their minds. The teams of volunteers would be dropped off at one end of a street of the neighborhood and sequentially go down the street stopping at each home. The Obama campaign of 2008 used a type of machine learning. They started with the same neighborhood but they had collected data on all the households in the neighborhood—income, education, party registration, magazine and newspaper subscriptions, health status, occupation and everything else they could obtain. Then they used a machine learning algorithm and classified each household as: Definitely for Obama, leaning towards Obama, undecided, leaning against Obama, and definitely against Obama. Instead of visiting every house in the neighborhood, they only visited the undecideds and those leaning against Obama. In any large neighborhood the classifications were amazingly accurate, so they knew with great precision in which neighborhoods they should expend their efforts. When they used the data to determine which houses on the street should be visited, the accuracy was not as good. They ended up trying to pitch some diehard Republicans and households that had voted democratic for the last 3 generations. Despite this shortcoming, the Obama campaign was delighted. They not only made great decisions about which neighborhoods to work, but their “wasted” efforts were far fewer than they had been when they visited every house on the street. Consider an advertiser who uses machine learning algorithms to decide which individual web users should see their ads often finds that 30% or more of the ads are mis-targeted. Their ad investment has been wasted. None-the-less they are delighted. Their alternative was to pick a website and play the ad for every user of the website often resulting in wasting far more than 50% of their ad money. To reduce that error to only 30%, means millions of additional advertising dollars to use productively, and hence millions of dollars’ worth of increased sales. Even though the machine learning is not as accurate when applied to individuals, it is most often better than nothing or better than the existing alternatives—better than the status quo.
Let us consider a different type of use of machine learning. Consider the U. S. Border Patrol which has about 21,000 agents to patrol 7,500 miles of border with Canada and Mexico. If the Border Patrol has a large data set consisting of individuals crossing the borders at different locations (legally and illegally), they could use machine learning to predict the parts of the border that will have the largest number or proportion of illegal entries. Based on that, they can make much more efficient assignments of agents to different sections of the border. The agency however will neither confirm nor deny that they use machine learning for this purpose or any other. Given the ease with which any agency can use existing data as a training set for machine learning, the low cost of doing the machine leaning itself, and the common use so many government and private operations make of machine learning; it is hard to believe that they are not using this technology at least experimentally and probably operationally.
Now let us assume that the Border Patrol makes another kind of use of machine learning with this same training set of data. The Border Patrol, under an existing Federal law, has the right to stop traffic, inspect and question passengers in vehicles within 100 miles of any border. That includes Federal, State, County and local roads. Now further assume that Border Patrol agents make decisions regarding which people and vehicles to stop and search based on the same machine learning technique but applying it to individuals. In this case, the accuracy of any one single prediction by the machine learning algorithm is much less than when it was used for agent allocation. In the case of agent allocation, even if an error is made, there is no specific harm done to any American citizens other than the indirect effect of suboptimal allocation of Border Patrol resources. But in deciding whom to stop and search, if wrong Border Patrol inconvenience citizens, make them late for appointments and job obligations, embarrass them and more important, potentially violates their civil rights. The courts will soon have to decide whether or not the results of a machine learning algorithm applied to an individual may be legally used as “probable cause” to justify searches and seizures. From a policy point of view, there are a huge range of issues where machine learning algorithms could be applied to individuals. Hopefully the legislative and executive branches of the nation will get out in front and consider the implications of this and make rational and desirable policies as opposed to doing nothing and ceding the policy making to the courts who would much prefer to have the legislature and executive branches make policy—especially technologically-related policies. The innovators and users of these new technologies, most of whom very much want national policies and guidance, are none-the-less afraid to have legislatures to interfere. They worry that most of the legislators are technologically ignorant of machine learning, AI or most other high technology. Even worse some legislators distrust, and disparage scientists of all kinds and are rarely influenced by prevailing scientific knowledge much less seriously consider expert scientific opinion and even proof. Here are but a few of the kind of issues that already are or will soon be in front of us where a machine learning algorithm is applied to individuals:
- Corporations or government agencies, fire employees.
- Suspected criminals (terrorists or otherwise) are detained and questioned.
- Death penalties and other sentences are given after findings of guilt by a jury.
- Jury can receive as evidence of guilt, classification by a machine learning application.
- Internet service providers may deny-e-mail rights to individuals and corporations classified as spammers by a machine learning algorithm.
- College admissions are based on machine learning algorithms applied to all applicants.
- Citizens are placed on The National No Fly list based on machine learning.
- People are denied rights to buy arms or drive cars based on machine learning.
- School or University grades are determined by machine learning.
- Companies are banned from advertising based on a low score they received from a machine learning algorithm to detect possible fraud.
There is Always Error
Since part of the underlying mathematics of machine learning are the mathematics of probability and statistics, they carry with them the errors inherent in the underlying statistical theorems and procedures. Essentially, the machine learning program using a training set of data, tries to produce an equation that weighs all of the different variables in the data set and assigns a number to any combination of data points. Typically there are at least 5 different variables and usually many more; there are many cases where there are hundreds and some with thousands. It is extremely difficult for people without a good bit of mathematical training to conceive of, much less understand and use, a 10 or greater dimensional space and equations within it. However, it is possible for most of us to visualize and understand many of the aspects of a simple 2 dimensional set of data. With this you can understand the potential weakness of machine learning methods.
The Relationship Between Height and Shoe Size of Adolescents
Let’s assume that you wanted to be able to predict the shoe sizes of a large group of adolescents and or any single adolescent without going through the effort and expense of measuring each person’s foot. Your computer already has their height. We want our machine to produce a formula to calculate the best estimate of each person’s shoe size.
The 2 dimensional graph to the left is a scatter-plot of a “training set” of 199 measurements of both variables that we have collected. For each person, we find his/her height in inches in the left (y) axis. Then we find the person’s shoe size (full and half sizes) and place the blue dot on the graph showing the height and shoe size. In the simplest case, we want the machine to come up with a formula where y (shoe size) = mx (m times height) + b. From algebra, we get y = mx + b. So we want the machine to tell us the m and the b, and we can make each projection with one multiplication and one addition. This is equivalent to finding a line that best fits the data of the training set. Ideally all the points on the graph would be on the line, and then all our projections would be perfect. However, that is exceedingly rare. Typically, mathematicians use a technique that has been in existence for more than a century. For each dot, its error can be thought of as the distance from the dot to the line. The mathematical solution finds the line where the average of all the errors is the smallest. (Mathematicians use the average of the squares of the error distances, but the idea is the same). At that point the m and b can be calculated and we have our formula.
Regardless of what the formula for the line actually turns out to be, remember that the line is determined by minimizing the average error—not by eliminating it. There is almost always an error in the underlying formulas whether we are dealing with 2 dimensions as this example, 22 dimensions (fairly common in machine learning instances) or 222 dimensions becoming ever more frequent with data mining and big data that is becoming available today.
There is a speech that is given to every engineering student early in the curriculum to explain the difference between physics and engineering. It is always something like, “In physics we have to get everything exactly right; in engineering we only have to be right enough”. Engineering is an art and craft that uses science, but for most of its practitioners, it is not science. In machine learning we have a tool that was developed by mathematicians who have to get everything right, but when used in practice almost always has error. As with the engineer, the use of machine learning only has to be right enough to accomplish some practical objective. In most cases, the standard is to produce a scheme that works better than what existed before or works almost as well as the existing method, but costs much less.. The goal is to automate the process by using a computer and thus accomplish the goal with less resource investment, with procedures that do not require investigation and/or decision making by humans who require payment, support, and training. Computers eat very little and are willing to work 24/7 under almost any conditions. In many of the recent utilizations of machine learning, the we cannot compare it to a human implemented alternative because that would so obviously expensive that the assessment could not be made economically if it were not done with machine learning. Yet if an organization has the data sets, it can obtain the critical open source software at no cost, and run it on an average office server. There is no capital investment that has to be made to use machine learning other than hiring or educating a staff person to understand enough of the technology so that he/she can discern what problems it might help solve and how to do it. This human investment is small. Most of the better universities have added courses in machine learning in the last few years, and there are a number of fine on-line courses. Many computer literate programmers can learn on their own given the transparency of the methodology.. There is another group of mathematicians and scientists who develop the ever growing library of machine learning techniques and software to implement the techniques.
Lack of Transparency
Mathematics and science have transparency not only as a norm but it is incorporated into the ethical standards of the profession. Results are not only expected to be openly shared but also the specifics of the methods, materials or whatever else might be necessary for others to understand the work and replicate the experiment—the gold standard of maintaining the integrity of science itself.
However, machine learning, by its very nature, causes some transparency problems with which scientific and public policy interaction have already had a few problems and these problems are likely to continue if not become worse as machine learning becomes more ubiquitous. To understand the problem better, consider this hypothetical situation:
A university has years of data on students and graduates including their pre university performance, test scores, high school grades, and other information including household and parental demographic information. Also, the university has a great deal of data on the student’s activities in the university when enrolled. Last it has the past data on graduate earnings, publications, advanced degrees, professions achieved and other presumed measures of success of the university education.
The university faculty, administration and trustees become aware that some of the mathematicians at the university have taken the existing data, used it as a training set for a machine learning implementation and have been able to assign to undergraduates a Success Predictor Score which is a very good predictor of success after graduation. So they vote to add a new requirement for graduation in addition to the 128 college credits distributed in conformation with the selected major field. It is that the Success Predictor Score must be large enough to have better than a 50 percent chance of becoming a success after graduation. Students under the threshold will not be granted a degree and each semester the machine learning algorithm will be run for every student and openly reported to him/her. Generally, the more courses, the better grades, the more extracurricular activities a student achieves the higher his score. After a number of years with this system it turns out that about 10 percent of the students who would have graduated under the old criterion, have their degree withheld under the new criteria. And half of those cannot achieve a sufficient score even after taking an extra year of college.
Inevitably, some of the students who could not graduate as a result of the new standard, sue the university charging that the system is capricious and arbitrary, discriminates against students by using race, ethnicity, economic status, parental occupation and other variables that students had no control over, that the criteria were not in conformity with the U. S. Department of Education Rules required for student loans to be granted to students and a host of other charges. The dissidents are asking for a preliminary Injunction requiring the university to give them their diplomas regardless of the machine learning outcomes.
A number of the cases are being tried together, and in an early round of testimony the University Dean admits that they use a machine learning technique and apply the algorithm that the computer has developed to all the current students equally, and the criteria are the same for all. However when the attorney for the plaintiff asks him specifically what are the criteria that are being applied, he repeats that it is the score that the computer assigns based on the machine learning results with the training set of all the students for the last decade and there were no biases on who was or who was not included. All were included. So the attorney tries another approach. “Dean Henderson, would you tell the court what criteria the machine learning computer uses of all the data that was input in the training set?” The dean admitted that he did not know specifically as he was not a mathematician and was basing his judgment on the views of the university mathematicians, statisticians and computer scientists who have assured the dean, the faculty, the trustees and the students that the score is a very accurate and valid predictor of success as they have defined it. And that success criteria have been chosen by the University to reflect success, and the court has been given those success criteria. The judge makes a preliminary ruling that the university has the right to select the criteria for success that they wish and condition graduation on meeting a high enough score. However, the plaintiffs had a right to know exactly what variables were used by the machine algorithm, and the way they were being used and the rules by which the decisions are made. The dean did not know but offered to have Professor Grelock, the Computer Scientist, who was in charge of actually running the computers and reporting the assessments, testify.
When Professor Grelock testified, he told the court that all of the variables were used in the equation assigning the scores to the students. When the attorneys and even the judge asked which ones counted the most, he replied that he really did not know. The judge, growing impatient, asked “Well who is it that actually made the rules and created the equation that the computer was using?” The reply: “No one your honor, the machine makes its own rules?” The Judge persisted, “Well what are the programming guidelines that control how the computer made its own rules?” the Judge asked.
Professor Grelock did not really know. However, Professor Feigenbaum, the head mathematician of the system who had also come to the court offered to try and explain. “With machine learning we really do not program the rules or even program the rules by which the computer shall make its own rules. Instead we tell the computer what input variables there are, what our success measurements are, and feed in the entire training set. Then we let the computer do its own thing. The computer, for all practical purposes, tries all the possibilities testing each possible formula until it finds the one that produces the least error. Then it uses that, whatever it happens to be.” Feigenbaum was challenged by the plaintiff’s attorney. “If you are using all the input variables for the training set, I believe that there were almost 1000 of them. The number of possible formulas to try would be huge—even for a computer. It would take a very long time would it not?” “Actually” the professor replied “the number is larger than that. Usually, we do not use just a linear combination of all the variables but exponential, logarithmic and various multidimensional polynomials. The actual number of possibilities is virtually infinite. So there are a number of techniques that we use—and we are developing more all the time—to sample which ones of the infinite possibilities that they use and in what order. I can assure you that the people who develop these have no knowledge of what the input data actually is. They could not bias it even if they wanted to. We happen to use a technique which is a recent variant of what mathematicians call ‘the path of steepest decent’. It was originated by a mathematics professor at Berkeley a few years ago before we even embarked on this project.” The judge then tried to resolve what was appearing to be a dilemma. “Professor, do you know the formula that the machine chose as the best one in this case?” When the Professor indicated he did, the judge turned to the white board at the edge of the courtroom and asked the professor to write it down. I do not remember it by heart, it is a very long equation—over 1000 different terms. I will be glad to print it on my PC which is connected with the computer in question, but there is no way that it would fit on a whiteboard or even a full computer screen.” “Well, could we ask you questions about how a particular variable is weighted and used relative to the others?” the judge asked. “I understand what you are looking for, your honor, but I just cannot tell. There are too many terms of too many different types for me to keep it in my head. If it was a machine learning problem that involved 4 or 5 variables or maybe even 20 or so, I could probably do that, but I can’t here. There may be someone else who can hold all this conceptually and answer your questions, but I cannot and I do not know anyone who can.” “What about the professor at Berkeley who developed that path of steepest decent. He should be able to do this.” Feigenbaum shook his head in the negative. “He almost surely could not, Your Honor. He even wrote in his paper describing the method that he felt that the entire field of artificial intelligence was being inhibited because it was limited to equations and numbers of variables that humans could understand. The big breakthrough in machine learning was that the computer would create formulas that were much more accurate and predictive than those simple ones that humans could create or even understand. We mathematicians can prove that these equations are valid equations and we can even show how much better they are than others, but we can’t understand this equation or most of the others for that matter.”
The judge thought carefully for a few minutes and then summed up the situation. “So the reality is that the University in this case, and presumably many other institutions in many other cases, are using formulas to make decisions that affect people’s lives and there is no way those people, the public or the government can review the formulas to determine if any bias laws have been broken or even whether the specific formula that is used is consistent with public policy. Legislators can’t legislate, regulators can’t regulate and people perceiving themselves harmed cannot even have a judicial review. Or am I missing something?” “No sir”, the mathematician replied, “we can prove that the machine can make the decisions better in most cases, but we cannot explain exactly what the formula means—because we typically do not know ourselves. I realize that this sounds strange to the layman, but we mathematicians often prove theorems about things we know little or nothing about. For example, I have never seen or even tried to imagine and 5,080 sided regular polygon but I can prove and exactly calculate the angles connecting the sides, and even the area if we know the length of any one side. One of the great strengths of mathematics is that we can prove relationships about things we have never seen, do not understand, and often can’t even visualize.”
This has all been a hypothetical case but it illustrates a major public policy issue that our society will soon face. How extensively can individuals or organizations use techniques that affect people that they themselves nor their experts fully understand and cannot explain to anyone? Machine Learning maven, Darren Remington has observed, “The analysis these systems are capable of performing are so vast and at levels of complication and scales so immense that human verification (by hand or thru traditional methods) is absolutely impossible. With any ML system in which the answer is really important, a big challenge facing ML ‘dataticians’ is finding a way to validate the results”. So in many machine learning operations there can be very different views of the validity of the “benefits”.
For most of us, we have an instinctive reaction to the situation. Prohibit all use of systems that the users cannot understand and explain to the rest of society. However, serious consideration soon shows that this would be a very counter-productive policy. It turns out that we use techniques very often which few if anyone understands completely and no one may be able to explain. This is particularly true, for example, in medicine. The Greek Billionaire Aristotle Onassis is one of the last people who died from a disease called Myasthenia Gravis. Since then, 1963, there have been several very effective treatments that have been developed despite the fact that the disease and its causes are not well understood yet, but no one who is treated dies any longer. Turkey successfully used cowpox inoculation to prevent smallpox a century before Jennings of England “discovered” the inoculation technique. The Turks had almost no understanding of the disease, vaccines or the mechanism of immunity. But very few Turks died from small pox, while Europe and North Africa were decimated.
Both ether and nitrous oxide (laughing gas) were used as anesthetics long before their mechanism of effectiveness was understood. New cancer treatments have been tried recently with minimal understanding of the exact mechanism of why the treatment might work. There are thousands of examples. In government we pass laws and initiate programs where the understanding of why or how they work is minimal and the rationale is often just based on someone’s idea or an ideology. Laws often have unintended and unanticipated consequences. Business tries tactics and strategies based on little but a hunch or as an emulation of some other firm that had been successful. In the Civil War, Abraham Lincoln deployed the new telegraph technology throughout the Union Army even though there had never been any previous military experience with telegraphy and there had only been one civilian demonstration of its operation over 25 miles. It turned out to be one of the great Union advantages. Napoleon deployed the process of food canning to preserve food to feed his troops when there had been no history or experience yet in any form, nor had it been proved safe. It is a common adage in Silicon Valley that “if you wait until a new technology is thoroughly understood before you use it, you will certainly be too late”.
The most dramatic example perhaps of all time was the use of the Jonas Salk live virus polio vaccine to immunize against polio. At that time where there was not only little understanding of why a live virus would not produce polio, but no experimental evidence that it was safe. Leading experienced virologists howled in protest claiming that use would be tantamount to murder. And indeed, at one point over 200 inoculated youngsters did get polio as a resu.t of a badly manufactured batch. None-the-less, the Surgeon General, other authorities and the public kept going. The initial results were so much better than the status quo ante, that America was willing to risk ignorance of danger to get the benefit.
Modern society, and especially America, has used many technologies not thoroughly understood so long as they appear to do something valuable better than the status quo and any attempt to require understanding before use is bound to do much more harm than good. Of course many attempts do no good at all, and some do harm.
In the case of machine learning, the lack of transparency is usually not due to a venal user wanting to avoid transparency. It is simply a case of inability to be as transparent as most of us would like because the machine is acting on its own to a great extent. Although with some government security agencies, the secrecy is often to minimize public opposition to practices they think might improve their security mission but would be unpopular.
Backward and Not Forward Looking
One limitation of machine learning solutions is that they reflect the reality of the past and not necessarily the present or the future. Since the training set that is used is, by definition, already existing data, it may or may not predict future results. For example, if we had a machine learning algorithm that would predict wither a person was for or against gay marriage from personality variables, and used a training set of data from the year 2000 we would have a much different predictive formula than if we had used 2010 data as public attitudes had changed dramatically in that decade. This is not a defect in the mathematics of machine learning, but is a consequence of the methods used. So designers and operators of machine learning systems as well as the users and interpreters of the results ought to keep this point in mind. The auto sales manager’s wonderful algorithm for sales follow-up calls, might not be so effective the following year when a new model comes out as the algorithm results are based on the training set of data about customer behavior for last year’s model. Very often there will be no or insignificant changes over time.
As Kate Crawford, one of the principle researchers at Microsoft and one of the world’s major AI scientists studying the implications of machine learning has stated so succinctly, “Predictive programs are only as good as the data they are trained on.”
Some mathematicians are even now trying to develop techniques that will periodically add new data to the training set and recalculate the optimum formula for use. Others are exploring the possibility of turning time itself into a predictive variable that is built into the formula. Indeed, Google and Amazon for example have so much data constantly coming in that they might run and improve the formula every day or even more frequently. While this continuous recalculating will doubtless improve the machine learning accuracy and hence the utility, it adds to the transparency and the perception of fairness problems. Two individuals assessed for a loan or some other machine learning application on subsequent days might have been assessed by two different equations. It is hard enough to explain why the machine came to its conclusion where the same formula is used throughout, but when we change the formula frequently as we collect more data, it makes it virtually impossible to explain and/or justify any result other than saying “the machine did it”.
If a training set for an algorithm to select who was most likely to respond to ads for different kinds of jobs and predict to whom the ads should be directed, the pattern the job selection of the past will be reflected in the resulting equations. And Dr. Crawford asks the prescient question, How would a woman know to apply for a job she never saw advertised.” 
We are living in an age where the middle class is being hollowed out, in large part because of the shrinkage of good-paying jobs that do not require much education. Most commonly, we think of robots replacing blue collar workers on a manufacturing assembly line, and there have been many millions of blue collar jobs lost this way—far more than were lost by corporations’ offshoring their manufacturing to low wage countries. Machine learning not only continues this trend but most frequently also displaces white collar workers—workers who before machine learning were hired for their judgment and ability to make decisions. Loan officers at banks, sales and marketing analysts — wherever sales organizations are to be found, advertising employees who made decisions about what ads to place in what media and expose to what people, accountants who would make judgments about which people or documents to audit, police officers who would decide how to deploy their manpower. They are all being in many places replaced by one machine that can be purchased for 1 month’s salary for one of these “deciders”.
With machine learning there is much more demand for data, and with growth, institutions using machine learning will generally need more workers who are data gatherers of various types from telephone survey agents to traffic monitors clicking a counter every time a person seems to glance at a billboard. The commonality unfortunately is that these jobs are all low wage jobs—much lower than the pay of the deciders who have been replaced.
As with most of the bad news about job displacement by automation, there is some good news with machine learning. A person is needed who oversees, codes, and feeds the training sets to the computer and directs the decisions to other computers where the decisions can be implemented. These computer specialist workers earn $80,000 to $150,000 annually, usually more than the “deciders” that have been obsolesced. However only 1 is needed for most companies and even in very large operations 2 or 3 can control many machine learning operations.
Most disconcerting to those concerned about the labor force of the near and mid term future is the fact that machine learning diminishes the importance and hence value of experience and seniority. The human who makes many of today’s decisions generally does not have an exact formula or procedure for making his/her decisions. With experience, the decider has gotten pretty good at the job else another would have been assigned instead. But if an inexperienced decider will only be right 60% of the time, the experienced worker might be right 80% of the time. But what happens when machine learning is right 90% of the time and costs less than 10% of the money needed to employ an experienced, senior human decider? The senior, human, employee who may have been a key person who has been loyal for 20 years is expendable.
Ideally those “expendable” human deciders would be retrained to control; the machine learning operations and the computers. They could even earn more money. Experience so far has shown this to be very difficult or even impossible unless the decider has had a strong education with a good amount of mathematics and/or science. Carnegie Mellon University is one of the foremost institutions in the world for training people to work in the artificial intelligence / machine learning arena. The dean of the AI Center and Program recently was interviewed by Charlie Rose on his prestigious PBS show. The host asked the dean what kind of people he looked for as students. The dean enumerated the three things that were most important: “Math, Math and More Math”.
However the national Science Foundation in it’s report, Science and Engineering Indicators 2012, told us “The average number of credits earned by high school graduates in advanced mathematics courses increased from 0.9 in 1990 to 1.7 in 2009”. The bad news is that what is needed to be able to learn these new jobs is 4, and then in most cases another 2 years at the college or community college level. And these high school courses are not general math courses. Calculus and above is what is needed. Pre-Calculus might do it if a person makes up for it in college. Without that, prospects are dim.
The report also tells us, “76% of all [high school] graduated earned a credit for algebra II in 2009 compared to 53% of all graduates in 1990. That is a great improvement but no where near sufficient.
The good news is that things are getting better fast. In the late 1980s, there were NO states that required 4 years of mathematics for a high school graduation. Today here are more than 20. But there is great political pushback to requiring 4 years of math for graduation. The fear is that the graduation rates will go down and the standard by which education has been measured at the local level will show that they are doing even worse. But that thinking is delusional. This is an excellent example where the high school diploma of the 1980s is insufficient to produce a citizen who has the prospect of earning a decent salary on the job market. Even a 100% graduation rate using the standards of the 1980s will likely produce a very poor and struggling work force. However if it takes another 20 years to teach most of our young people the amount of math required today, technology would have advanced at even a faster rate than the last 2 decades. And it is likely that the educational requirements would also have been advanced even more. In my opinion, by 2030 no one without 7 or 8 years of high school and higher education mathematics, as well as Physics, Chemistry and Biology AND the competency to code computer programs will likely find a decent paying job including law and medicine. Even history teachers will have to have this level of education to even use the educational tools of the next generation.
In addition there are mathematicians and computer scientists who are creating more and better machine learning systems and many machine learning experts who can advise companies as to which problems and situations machine learning could be applied. IBM which has developed one of the major software packages for machine learning has made it freely available to all without charge. With the software and the computers already existing in most universities and companies, the only barrier that prevents millions of companies and non-profit institutions from using machine learning tomorrow is the fact that they do not understand how they might use it and what benefit they would accrue. IBM has created an entire operation to sell consulting services showing organizations how they can deploy machine learning. They are targeting the training of 1 million professionals. The IBM employees and the professionals they advise and train, will be the job beneficiaries of machine learning. There are a substantial number of highly educated people with sufficient mathematical and computer background and knowledge of machine learning to have formed a small industry of highly paid sophisticated elites who have already become the big occupational winners as a result of machine learning. There are a small number of companies that will end up with a small piece of the millions of the markt of advising organizations who will profit from machine learning. But many more lesser-educated current white collar workers without mathematical background who will be driven from the ranks of the middle class to data gatherers or the unemployed. It is hard to tell how many new white collar jobs will be created by the growth that machine learning will produce in the American economy, but some bet that there will be even more well-paying jobs created than are eliminated. What there is total agreement about is that the pain of the change will be borne by one socioeconomic subgroup of Americans while the majority of the benefits will inure to another.
What Public Policy Should Be Regarding Machine Learning
With little or no knowledge of machine learning, the vast majority of people are not even considering public policy implications. However, with an initial stimulus, such as that provided in The Economist article and others including this one, most of us will have public policy views that tend to reflect our already existing political and philosophical positions about privacy, inequality, rapid change, role of government, education, civil rights, and the other polarizing issues of our day. With more consideration, many of us will develop ideas that are shaped by the unique challenges of machine learning rather than our general view of globalization and the like.
There is also, the danger of regulation that is too early. Darren Remington has a good analogy:
The danger in regulating too early is comparable to defining airline regulations in the early 1900’s at the time of the Wright Brothers earliest flights: all flights must be limited to 400 yards (since no plane can safely fly further), no faster than 45 mph (faster puts the pilot at too much risk), no higher than 15′ altitude (you can survive a 15′ fall but not a 50′ fall), shall be in a straight line (they are too unstable in turns), requires a catapult launch into the wind and a wind speed of no less than 15 knots (for the safety of the pilot and bystanders), and no more than 1 passenger is allowed (since no engine or airframe can safely support more). How would even a single one of these have impacted the evolution of our airline industry?
An Aristotelian moderation is therefore of great importance. It is perhaps more important that there be public discussion about machine learning in the next few years, than there be laws or regulations. This can come after there is more of a public understanding of the wonderful benefits that are possible with machine learning and the likely dangers, as opposed to the possible , theoretical dangers.
In my own case, I have gone through this process and have formed a few tentative views that might be a starting place for having a rational public debate on the role of this exploding technology.
I start with some general guidelines for discussion.
I. Private companies and organizations should be able to use machine learning with little or no interference by government for the next few years. Customer and other stakeholder attitudes will and should shape the behavior and policies regarding machine learning. Government regulation and law should refrain from involvement except in very serious as yet unanticipated circumstances.
II. Government and quasi-government agencies should also be free, within the discretion of the existing political structure and management to utilize machine learning for making management and other decisions that are not applied to individual citizens directly. They should refrain from applying machine learning to decisions involving individual citizens, groups of citizens or businesses with exceptions requiring personal Presidential approval.
III. The problem of job displacement, while real and profound, should not be dealt with for machine learning in isolation. Machine learning, even with the most optimistic growth will have only a small effect on the nation’s economic picture as opposed to technology and automation in general. This problem must be dealt with generally and not technology by technology. Therefore, it should not be a consideration of machine learning policy other than perhaps to have specialized retraining programs for workers who are likely to be displaced by machine learning.
IV. In our complex, American, capitalistic society there are many public functions that are performed by private as well as public entities. Water and waste management, prison operations, economic development, policing and security, education, etc. When there are such that affect the lives of individual and groups of citizens, these private entities should follow the same restrictions as government as stated in II above until more refined policies are created and accepted by society.
V. There are a number of critical services that private entities provide that are sufficiently critical that as a practical matter, citizens and businesses need access to in order to have a reasonably successful existence. E-Mail addresses and Internet access, telephone, TV, access to housing, access to advertising on the internet are all examples. Private entities that provide these should also be prohibited from using machine learning to make decisions about individual people or businesses if the private entities control 20 percent of the market in an area for that service or 5 or fewer of the companies control more than 80 percent of the market in an area. So an ISP (Internet Service Provider) company that dominates the market in an area, may not use machine learning results to ban an individual from having e-mail addresses or mailing privileges. If they can prove violation of reasonable use rules, a ban is OK, but to infer it on the basis of machine learning results or to target individual people or specific businesses to investigate using machine learning should not be allowed until the field develops a much better way to have public and judicial review of the methodology. Companies like Google are so sensitive to this issue that that when they ban advertisers using machine decisions, they allow appeals to a human panel albeit the appeal procedures are not at all transparent. But companies like Google would probably welcome rationally developed norms while companies like Oracle will likely follow its pattern of secrecy with little value of transparency.
VI. Medicine and Dentistry should not be covered by IV and machine learning applied to individual cases should be regulated and must be approved by the existing FDA mechanisms for other medical techniques and pharmaceuticals.
These 6 are a starting point for a national dialogue—not an end or objective. They should be fleshed out, refined, curtailed or expanded as the technology changes, our society changes, and we change by understanding more about what will soon be affecting each of us every day.
 Phil Simon (March 18, 2013). Too Big to Ignore: The Business Case for Big Data. Wiley. p. 89. ISBN 978-1-118-63817-0.
 Journal of Machine Learning Research and Machine Learning
 IBM Web Page https://www.ibm.com/analytics/us/en/technology/spark/?S_PKG=&S_TACT=M1610LSW&campaign=Unbranded|Search|Spark|NA|N/A|&group=Machine_Learning-DS%20(Broad)&mkwid=34ea182d-90f5-4a10-b144-d9e1444a1226|509|223062094388&ct=M1610LSW&iio=BANAL&cmp=M1610&ck=advance%20machine%20learning&cs=b&ccy=US&cr=google&cm=k&cn=Machine_Learning-DS%20(Broad)
 The Economist, “From not working to neural networking”, June 25, 2016
 Gerlach, W.; Stern, O. (1922). “The experimental proof of the space quantization in the magnetic field.” Journal of Physics 9: 349-352
 “United States Border Patrol”, Wikipedia, June 25, 2016, https://en.wikipedia.org/wiki/United_States_Border_Patrol
 American Civil Liberties Union, The Constitution and the 100 mile border zone, June 25, 2016, https://www.aclu.org/constitution-100-mile-border-zone
 Darren Remington, Personal E-Mail Communication, July 20, 2016
 Kate Crawford, A.I.’s White Guy Problem, The New York Times, June 26, 2016
 Kate Crawford, A.I.’s White Guy Problem, The New York Times, June 26, 2016
 IBM Web Page, June 27, 2016 https://www.ibm.com/analytics/us/en/technology/spark/?S_PKG=&S_TACT=M1610LSW&campaign=Unbranded|Search|Spark|NA|N/A|&group=Machine_Learning-DS%20(Broad)&mkwid=34ea182d-90f5-4a10-b144-d9e1444a1226|509|223062094388&ct=M1610LSW&iio=BANAL&cmp=M1610&ck=advance%20machine%20learning&cs=b&ccy=US&cr=google&cm=k&cn=Machine_Learning-DS%20(Broad)%20http://www.ibm.com/blogs/think/2016/06/21/watson-sorting-hat/
 Darren Remington, Personal E-Mail Communication, July 20, 2016
Written by Lewis D. Eigen
Christian theological seminaries are currently banned in Turkey. Yet this is a prohibition with which some Westerners and Christians agree, and even those who do not, often understand. The complexity that has resulted from the clash between Islam and modernity is so great that it is almost impossible to tell what is liberal and democratic and what is not. The conflict between Moslem Turkey and Christianity with respect to theological seminaries is a marvelous example of things being in reality very different from what they first appear. This is the story of complexity where up can be down and wrong might be right.