AI: Driven by data
The second in a series of blogs by A.I. expert Dr Jeroen Vendrig
The technology responsible for AI’s recent breakthrough is data-driven machine learning. In this post we lift the veil on what ‘data-driven’ means, and why it’s important to understand even if you yourself are just a user of AI. The terminology used in the AI world is borrowed from our daily life language, but be aware that there are subtle differences in the meaning of terms.
Humans can learn by experience, for example on-the-job tra ining. The concept of learning without an explicit knowledge transfer from man to machine has inspired the data-driven learning approach. Imagine the simple act of catching a thrown ball. There’s a mix of physics formulas needed to predict where the ball is going to go. We give medals to those few high school students who can work out those equations during their 3 hour final exams. Yet we expect kids in kindergarten to catch the ball in a split second. These kids probably don’t even understand the concept of gravity, other than “falling is ouch”.
Youngsters don’t hit the books, they are learning by example. So does supervised machine learning. A health AI may receive an example of a patient aged 30, with a weight of 60kg who recovered from surgery in 3 days. Another patient aged 40, weighing 80kg, recovered in 4 days, etc.. With enough examples, a machine learning algorithm trains a model that observes characteristics of a new patient (age 32, weight 72kg) to predict a recovery time of 4.32 days. A prediction (inference) for a particular patient could be wrong, just as a kid won’t catch all balls. But a successful algorithm will be approximately right most of the time.
Data-driven machine learning is powerful, but there is a catch: it’s only as good as the data it was fed. A model trained on children’s hospital data may not work well in the geriatric ward. If blood pressure is a key factor, but it’s not recorded in the data set, the model has no choice but to ignore. If the data set contains many characteristics that are not or hardly related to recovery time, the model may find spurious correlations by coincidence.
Machine learning methods are designed to cater for the possibility that a new case is not exactly the same as one of the cases in the training data set. However, the better the distribution of characteristics used for training reflects the real world, the more accurate AI will perform. That’s why “big data” has been one of the foundations for the rise of AI. Just remember that “big” is not all about size, but also about coverage and variety.
In the above example, we showed input values (age, weight) which are available at both training time and at inference time when a new patient is encountered. These are called samples or instances. The output variable (recovery time) is available at training time only, and is known as a target for the sample. At inference time, the challenge is to predict that value.
Data is the key ingredient for machine learning, but how does an AI know what outcomes we expect it to extract from the data? In the next post, we’ll discuss how problem definitions can go horribly wrong.
Quick quiz
In this post’s example a surgery recovery time is predicted. What type of AI task is that? (See answer in blog post 1.)
The first in a series of blogs on AI by Dr Jeroen Vendrig, from Canon Information Systems Research Australia (CISRA)
While email has become an everyday part of our work-life, you still might be doing it wrong.
Almost everyone uses some kind of mobile or wearable device today, but how secure are they for use by GPs?
Protecting your patient data is increasingly complex with threats of cyber-attacks and data breaches coming from both inside and outside your practice.
Sometimes lawyers can get a bad rap if they don’t adapt adopt to new technologies.
Experts weigh in on how to dodge diversions and be more productive at work.
In the new era of law, contracts are being completely re-designed or even re-imagined in various ways to make them easier to understand
How to settle on the right practice management software for your business
Boost your skills, discover new opportunities and improve your mood by giving up your time for the common good
If you lead an established, market-leading firm, you face a dilemma.
Email isn’t going anywhere, but there may be better ways to communicate and collaborate with your colleagues