What is image recognition? Types, mechanisms, AI development processes, case studies and key considerations.
Only a decade ago, computer-based image recognition could only read barcodes and plain text. However, with the advent of AI, particularly deep learning, accuracy has greatly improved, and today it can even recognise details that people might miss with the naked eye.
Still, many people may not have a clear understanding of what image recognition is or how it can be applied. This article introduces the mechanisms of image recognition, common types of AI-based image recognition, and recent examples of how it is used in practice.
In the second half, we also cover methods and key considerations for system development, so that you will have a comprehensive view of the steps needed to incorporate image recognition technology.
Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.
|
【Table of Contents】 |
1. What is image recognition?
Image recognition (or image analysis) is a machine technology that identifies objects in an image. Recent technological developments have made it possible to distinguish not only between different types of objects and animals, but also between similar types, such as 'this person is Mr A who is happy'.
Image recognition research has existed for more than 100 years, and until about 10 years ago was mainly used for barcodes and OCR (Optical Character Recognition). However, because computers lacked the ability to learn and remember visual appearances the way humans do, determining what was in the image was a difficult task.
Using AI, systems can now learn the characteristics of people and objects and identify them in images. In particular, accuracy has improved significantly since deep learning was introduced to image recognition in 2012, enabling advanced recognition.
Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.
2. Types of image recognition
Image recognition can generally be divided into the following six main categories based on how images are recognised:
| ① Image Classification: Identifies the type and situation of the entire image ② Object Detection: Recognises specific parts within an image ③ Anomaly Detection: Detects anomalies within an image ④ Image Captioning: Generates descriptions of the state of the image ⑤ Segmentation: Identifies objects on a pixel-by-pixel basis ⑥ Facial Recognition: Identifies individual features of a person |
Here is an explanation of each of these types of processing.
① Image Classification
Classification determines the overall situation of the entire image. Unlike object detection, which recognises specific parts within an image, image classification categorises the overall context of the image. For this reason, it is sometimes referred to as scene recognition and may be categorised separately from object recognition.
For example, let's say you have an image where the sky is dark and most of the pedestrians are holding umbrellas. If we categorise this image by weather, it can easily be classified as "raining".
② Object Detection
Object detection is the task that recognises and locates specific objects within an image. It not only provides information about the type of object, but also about where the object is located in the image. If there are multiple objects in an image, they can all be detected at the same time.
For example, when applying object detection to an urban landscape photograph, various elements such as people, cars, and trees can be detected at once, and each object can be identified where it is located within the image. This allows for a more specific and detailed understanding of the information contained in the image.
③ Anomaly Detection
Anomaly detection is a technique that detects unusual or 'abnormal' situations in an image. The technology learns what is normal and has the ability to detect deviations or anomalies when they occur.
It detects anomalies either when predefined abnormal situations occur or when situations not present in the training data occur.
Commonly used in crime prevention and security management, this technology has the ability to effectively capture certain scenarios. For example, it can detect a situation where a suspicious person approaches a safe and attempts to open it, can be considered an anomaly and an appropriate alarm can be triggered. For this reason, anomaly detection is often used in conjunction with security cameras and surveillance systems.
④ Image Captioning
Image captioning is a technique that enables AI to recognise the content of an image and convert it into words, or captions (text), that humans can understand. This technique is achieved by combining image recognition techniques to capture image features and natural language processing (NLP) technology to express these features in natural language.
Image captioning has a wide range of applications and is being developed as a tool to explain the content of images and videos, as part of improving accessibility for the visually impaired, and as a means of adding text labels to large numbers of images on the internet. This allows search engines to understand the content of images and provide users with search results based on this.
⑤ Segmentation
Image segmentation is a technique that identifies each pixel of an image individually and determines what it represents. This ensures that all objects in the image are recognised in detail.
In addition to classification, which identifies the type of object, segmentation can also determine its specific shape and position. Because of this precision, it is used in areas where a high degree of accuracy is required, such as automated driving and medical image analysis.
Click here for more detailed segmentation methods and examples. In this article, the types, mechanisms, and application examples of segmentation are explained in detail. Reading them together will further deepen your understanding of this article.
⑥ Facial Recognition
Facial recognition is a technology that identifies a person's facial features and determines who they are. It is used in a variety of applications, such as unlocking smartphones and managing access to buildings.
Facial recognition analyses the position of the eyes, mouth, nose and ears, as well as the size and shape of each feature, to extract and identify the unique attributes of each person.
3. Examples of image recognition application
Here are some examples of image recognition being used in practice.
| ① Mizuho Bank: Automation form registration with AI handwriting recognition ② Hitachi: AI detection of employee non-use of safety equipment ③ Waon Coffee: Understanding customer demographics with facial recognition during temperature checks |
① Mizuho Bank: Automation form registration with AI handwriting recognition
Mizuho Bank has successfully automated 80% of form registration by extracting handwritten text as images and reading it as text information using AI's Optical Character Recognition (OCR) technology.
Mizuho Bank had been using manual data entry for forms that were both "handwritten" and "non-standard", making them difficult for machines to read. The introduction of OCR using AI technology has therefore enabled the bank to recognise handwritten text by extracting images and accurately identifying what each piece of data represents.
As a result, 80% of the form entry tasks can now be automated. It is also expected to reduce the time taken to register forms by a factor of 10 and cut costs by half.
② Hitachi: AI detection of employee non-use of safety equipment
Hitachi has developed a system that automatically checks workers' safety equipment using image recognition technology it has developed over many years to help prevent accidents.
Previously, the only way to conduct inspections was for managers to visually check workers' safety equipment. However, with this method it is not practical to monitor all workers in the facility at all times. So Hitachi introduced a system that uses video data to determine whether safety equipment is being worn correctly and has made it possible to automate inspections.
In this system, AI checks the safety equipment of workers from camera images installed at the entrance of the workspace and other locations, and determines whether there are any abnormal conditions. The system can also check details such as whether the chinstrap of a helmet has been removed or whether the waistband of trousers is not tucked into boots. This gives the AI the ability to accurately detect potential hazards that would otherwise be missed by the naked eye.
③ Waon Coffee: Understanding customer demographics with facial recognition during temperature checks
Waon, a souvenir store in Tochigi Prefecture, uses facial recognition AI with a temperature check feature to analyse customer demographics for marketing and product placement.
Previously, the company had stocked and arranged products based on years of intuition and experience, assuming that the majority of store visitors were women. However, AI-based customer profile analysis revealed that half of the visitors were men. The store is now using this data to make future product decisions.
In this way, the system not only enables safety management through temperature checks, but also collects a wide range of customer data, such as gender and age, which can be used to improve the store and thereby increase customer satisfaction.
Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.
| ① Identify the problem ② Determine the required data ③ Set the required accuracy ④ Collect image data relevant to the task ⑤ Train the AI and build a system ⑥ Maintain the system as required |
Here is an explanation of each procedure:
① Identify the problem
The first step is to identify what you want to solve. In other words, this step clarifies the purpose for which image recognition is to be used.
For example, you can clarify a specific problem, such as the automatic detection of product defects in a factory, the diagnosis of a medical condition from medical images or the detection of obstacles for self-driving vehicles.
Once the purpose is clear, the next step is to determine 'what data needs to be collected' and 'the necessary accuracy'.
② Determine the required data
Secondly, you need to determine what data is required to solve the problem. At this stage, you look for available data sources and develop a strategy for collecting the data you need.
You should also consider securing resources for data labelling and obtaining any specialist knowledge that may be required for accurate labelling.
③ Set the required accuracy
Next, set the minimum accuracy required to solve the problem. For example, in the case of defective product detection, the goal is to detect all defective products while minimising errors.
This accuracy will depend on business requirements and technical constraints and should be discussed in detail with stakeholders.
Of course, depending on the budget, it may not always be possible to develop a system with high accuracy, but if the system cannot deliver the minimum required accuracy, it will be pointless. To avoid such situations, make sure that the 'objective' of the system, the 'data to be collected' and the 'required accuracy' are clear at this stage before moving on to system development.
④ Collect image data relevant to the task
| Things to consider | Details |
| Data variation | If you only collect certain types or conditions of image data, the model may be overly adapted to those specific situations and not perform well in other situations. It is therefore essential to collect a variety of data, including images from different angles and lighting conditions, as well as different backgrounds and contexts, so that the model can adapt to a wide range of situations. |
| Quantity of data | You need a large amount of data if you train an AI model. If there is too little data, the model will be under-trained. Data collection is time consuming and expensive, but having the right amount of data available can improve the performance of the model. |
| Quality of data | The model may not be able to learn properly if the image is blurred or the subject is unclear. |
| Labelling | In supervised learning it is important to label the data correctly. Incorrect labels can affect the model's ability to learn and reduce its performance. |
From these perspectives, proper data collection and organisation are critical. As this requires specialist knowledge, it is recommended that you seek expert advice from data analysts and AI engineers.
⑤ Train the AI and build a system
Finally, you can train the AI and build a system once the data is collected. To be more specific, the process is broken down into the following more detailed steps:
Model selection and training
You can choose the most suitable model for the image recognition task (e.g. convolutional neural network) and train the AI using the collected data. This process can be time consuming and resource intensive.
Performance evaluation and tuning
Once you have trained the initial model, you can evaluate its performance. If the model does not produce the expected results, you can adjust the parameters or modify the architecture of the model.
Iterative improvement
Developing an AI is an interactive process. To improve the performance of a model, you may need to collect additional data, introduce new features, or try different models.
You should not expect to build a perfect model in the first attempt. You must continually check the accuracy of the system, adding or removing data as necessary to improve its accuracy.
⑥ Maintain the system as required
Once the data has been adjusted to achieve the required accuracy, the system can begin to be used as an image recognition system.
Once the system is up and running, you will need to monitor its performance regularly. This will allow you to detect any unexpected behaviour or loss of accuracy at an early stage. As new data or improved measures become available, the system can be updated by retraining with new data or introducing new technologies.
AI systems require regular maintenance. You need to monitor the system to ensure that it is accurate and that there are no anomalies in the system, so that the system remains as effective as possible.
Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.
5. Important considerations for image recognition
While image recognition can offer significant benefits to businesses when used correctly, there are a number of important considerations that must be taken into account to avoid potential problems:
| ① Avoid privacy violations ② Potential decline in AI model accuracy after development ③ Need for high-quality data in large quantities |
The reasons for each of these are explained below.
① Avoid privacy violations
Information obtained from data collection may contain privacy-related information, such as facial images and vehicle registration numbers. If security is breached and the information is leaked, this may result in a violation of personal privacy.
When you handle data that may contain privacy information, personally identifiable information should be removed or modified so that it cannot be identified. This is done by using anonymisation or pseudo-anonymisation techniques, such as blurring facial information or car licence plates.
You should also make sure that your company's network security is in place if you process the data in-house, and that of your outsourcing company if you do so externally.
② Potential decline in AI model accuracy after development
The accuracy of AI models can decline after the system has been built. Accuracy degradation can occur when there are changes in the external environment or data imbalances due to new learning (relearning) occur.
In practice, however, some of these factors are, to some extent, unavoidable. Therefore, you need to assess the performance of your model on a regular basis and take action if the accuracy declines. Plan the man-hours for regular retraining in response to newly collected data or a changed environment.
③ Need for high-quality data in large quantities
An AI system will be inaccurate if the underlying data lacks quality and quantity. Even if the quality of each image is high, if the data is unbalanced, the system will have low prediction accuracy (poor generalisation performance) for unknown data.
Collecting large amounts of high quality data is time consuming and expensive. To balance budget and accuracy objectives, you may consider data expansion methods, which increase the amount of data by transforming the same data in different ways. You may also consider using pre-trained models that have been trained on large amounts of data.
These considerations and adjustments require a high level of data knowledge and expertise. If you do not have in-house experts, it is advisable to outsource the data collection to a specialist company.
6. Summary
This article has explained the different types of image recognition, examples of its use and methods of implementation. There are a number of techniques of image recognition, which can be used for different purposes and are used for a variety of applications across industries.
Image recognition has a long history, but only recently has it become possible to use deep learning to achieve advanced recognition. To stay competitive, consider consulting with specialised companies to explore the potential applications of image recognition within your own business.
Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.
Author
Toshiyuki Kita
Nextremer VP of Engineering
After graduating from the Graduate School of Science at Tohoku University in 2013, he joined Mitsui Knowledge Industry Co., Ltd. As an engineer in the SI and R&D departments, he was involved in time series forecasting, data analysis, and machine learning. Since 2017, he has been involved in system development for a wide range of industries and scales as a machine learning engineer at a group company of a major manufacturer. Since 2019, he has been in his current position as manager of the R&D department, responsible for the development of machine learning systems such as image recognition and dialogue systems.
Latest Articles