With the advancement of AI technology, it has become possible to predict objects in images with high accuracy. AI is utilized in many situations around us, such as security cameras and searching for similar images.
AI image recognition methods are mainly categorized into the following three. Among these, "segmentation" is particularly important. Segmentation is a technology that divides an image into multiple objects using machine learning.
| Image Recognition Method | Overview |
| Image Classification | Classifies the category or class of the image itself. |
| Object Detection | Identifies the position, count, and type of objects within an image. |
| Segmentation | Divides the image into objects. |
AI image recognition, including segmentation, is undergoing rapid development. It is already being put into practical use in scenes such as autonomous driving and facial recognition, becoming a part of our daily lives.
In this article, we will introduce the overview, methods, and application examples of "Semantic Segmentation," the most common type of segmentation, so you can gain an image of how to utilize it in your business.
In addition, What is image recognition? Types, mechanisms, AI development processes, case studies and key considerations. provides a detailed explanation of image recognition types, application examples, and how to build models. Reading it together will further deepen your understanding of this article.
|
【Table of Contents】 |
Semantic segmentation is an image recognition method that assigns class labels on a pixel-by-pixel basis. It makes it possible to understand in detail what each object in the image is—meaning which class it belongs to—and where the boundaries are.
For example, as shown in the image above, it is possible to recognize and divide cars, trees, road markings, terrain, shadows, etc., within an image at the pixel level.
The main applications for semantic segmentation include autonomous driving, healthcare, security, and robot vision. In these fields, precise object recognition is important, and there is a need to understand not only the "type" and "position" of an object but also its "shape" and "extent" in detail.
On the other hand, semantic segmentation does not distinguish between individual objects. In particular, when multiple objects of the same class exist in an image and they overlap, each object cannot be identified individually.
For example, in an image of a crowd where multiple people are gathered, it is difficult to identify each person individually with semantic segmentation. To solve such problems, instance segmentation is used.
Accurate annotation is essential for high-precision segmentation. To perform segmentation, it is necessary to teach what each pixel represents (e.g., "dog," "cat," "background," etc.) in advance through annotation work.
Performing pixel-level annotation for each image is a major task that takes time and cost. Furthermore, accuracy is often required for fine details, such as objects with difficult shapes or subtle color differences.
Therefore, annotation is a high-difficulty technique that requires specialized knowledge and experience. Annotation is a crucial step in segmentation, and its accuracy directly impacts the results. Depending on the project, it may be outsourced to a specialized annotation company.
There are two methods of segmentation other than semantic segmentation.
| ① Instance Segmentation ② Panoptic Segmentation |
Each of these methods is applied according to different purposes and requirements, and their appropriate selection and use are key to building a high-quality image recognition system. Here, we explain the characteristics of each method.
Instance segmentation is a segmentation method that can recognize objects individually. While semantic segmentation treats objects of the same class as one, instance segmentation identifies each object separately even if they belong to the same class.
For example, in driver assistance systems, instance segmentation is used to recognize cars on the road individually. In addition to recognizing all cars in an image under the "car" class, instance segmentation identifies each car as a different element. This makes it possible to track the position, direction, speed, etc., of each car.
Instance segmentation is particularly useful when objects overlap or when you want to count the number of specific objects.
Panoptic segmentation is a segmentation method that identifies individual instances of the same object while performing labeling for all pixels. It has characteristics that combine semantic segmentation and instance segmentation.
For example, in an autonomous vehicle's vision system, road objects (cars, pedestrians, bicycles, etc.) must be recognized individually using an instance segmentation-like method. At the same time, background elements such as roads, sidewalks, and trees must be labeled by class using a semantic segmentation-like method. Panoptic segmentation is what can achieve both these tasks at once.
Since it can perform more advanced image recognition than semantic segmentation or instance segmentation, it is effective when high technology is required, such as in autonomous driving or healthcare.
Many methods are used in semantic segmentation. The following are the main methods.
| ① FCN (Fully Convolutional Network) ② SegNet ③ FPN (Feature Pyramid Network) ④ R-CNN (Region-based Convolutional Neural Network) ⑤ CNN (Convolutional Neural Network) |
Those particularly related to semantic segmentation are FCN and SegNet. On the other hand, R-CNN-based methods and FPN are commonly used for instance segmentation and object detection tasks. Here, we briefly explain these methods and their mechanisms.
FCN is an image recognition method that performs segmentation using only "convolutional layers" that extract image features. A characteristic feature of FCN is that it does not require "fully connected layers" and is composed entirely of "convolutional layers."
The key feature is that it can perform semantic segmentation for the entire image in a single process, enabling pixel-level labeling.
Fully connected layers rearrange 2D image data into 1D and output results. However, since FCN does not use fully connected layers, it can output results without losing position information. From these characteristics, FCN can be called a CNN adapted for semantic segmentation.
SegNet is a model that further develops FCN, an image recognition method with an encoder and decoder structure. The encoder part extracts image features through "convolutional layers," and the decoder part returns the data to its original size while maintaining the position information of the extracted features.
The encoder-decoder type has the advantage of being able to reproduce high-resolution images via the decoder part. However, the disadvantage is that precision drops slightly because pixel-level information is lost.
FPN is an image recognition method characterized by a pyramid structure that predicts objects at different scales. Conventionally, methods were used to change the scale to predict objects of various sizes within an image. However, heavy calculation cost was a major drawback.
However, FPN adopts skip connections in the top-down process of the pyramid. In this way, FPN achieves a pyramid structure with low computational complexity.
R-CNN is an image recognition method that minimizes calculation as much as possible by predicting the position of target objects in advance before performing convolution. It surrounds parts likely to have features with bounding boxes and analyzes within that range.
R-CNN has many derivatives, with representative ones including Faster R-CNN and Mask R-CNN.
CNN, which specializes in identifying objects, is a method frequently used throughout image processing. However, it is basically not used for segmentation.
In CNN, image features are first extracted in "convolutional layers." Next, feature maps are created by reducing the image data in "pooling layers." Finally, output results are obtained by recursive processing in "fully connected layers" that perform data integration.
| Related Article: What is a bounding box? How is it used in YOLO? A thorough explanation of the advantages and disadvantages of object detection methods |
4. Case Studies of Semantic Segmentation
Semantic segmentation is utilized in many industries, such as IT and healthcare. It is particularly used in fields where precise object recognition is important and details like "shape" and "extent" must be understood in addition to the "type" and "position" of an object.
Here, we introduce the following three case studies.
| ① Detection of cracks and spalling on concrete surfaces ② Object prediction in autonomous driving systems ③ Automatic extraction of organ regions in the medical field |
Systems have been developed using semantic segmentation that can distinguish and detect cracks and formwork traces on concrete surfaces. Conventionally, inspection workers would visually confirm and sketch cracks. However, the number of concrete structures is enormous, and manual inspection simply cannot keep up.
In addition, it is predicted that in the near future, tunnels and concrete buildings built during the rapid economic growth period will age and enter a period where they must be updated all at once. However, there are concerns about a shortage of specialized technicians in the maintenance field due to declining productivity from the aging population and the retirement of skilled engineers.
Therefore, system development has been underway to analyze concrete images via machine learning and quickly detect anomalies. In particular, technology is advancing using semantic segmentation to accurately distinguish between P-cone marks (dents from fastening hardware) or formwork traces, which are easily confused with cracks.
Similarly, research is progressing on using semantic segmentation for detecting exposed rebar and concrete spalling.
To efficiently improve autonomous driving performance, technology is being developed to convert image data taken during the day into data taken at night. By using semantic segmentation in that process, it has become possible to generate nighttime images with higher precision than before.
Previously, nighttime images were created using an image conversion technology called CycleGAN, but accuracy was not ideal, with issues like tail lamps not lighting up or traffic lights appearing in mid-air.
By utilizing semantic segmentation, which performs prediction at the pixel level, objects in images can now be labeled more accurately. This makes it possible to accurately determine whether an object in an image is something that glows or something whose state does not change at night, leading to higher precision in nighttime image generation.
Segmentation is utilized in image analysis in the medical field. It has become possible to automatically extract organ regions from CT images or perform labeling for blood vessels by combining it with other methods.
The CNN used here can represent output results as an image by implementing a "transposed convolutional layer" after the convolutional layer. Therefore, it is possible to label on the image.
In this way, semantic segmentation performs labeling at the pixel level, so it is beginning to be used in the medical field where accuracy is required.
A. Both object detection and segmentation are important parts of image analysis, but they have different purposes and methods.
Object detection detects only specific objects and identifies their position and category. Specifically, it outputs rectangular bounding boxes indicating the presence of an object and the class of that object (e.g., dog, cat, car, etc.).
On the other hand, segmentation classifies all pixels in an image into specific classes. In other words, semantic segmentation helps in understanding the precise shape and position of objects across the entire image.
Instance segmentation combines parts of these two methods to identify each individual object instance and understand its shape and position.
Furthermore, How does object detection work? A thorough explanation of the amount of data required, use cases, and construction steps! provides a detailed explanation of object detection mechanisms and application examples. Reading it together will further deepen your understanding of this article.
A. While semantic segmentation is a powerful method in image analysis, it has the disadvantage of being unable to identify each object individually when they overlap. For example, in an image where people form a crowd, it cannot distinguish individual people and recognizes them only as a single "human" region.
In addition, semantic segmentation requires accurate labeling for each pixel, thus necessitating a massive amount of time and effort for annotation. While it has the advantage of predicting with high accuracy, the initial investment tends to be large.
In this article, we have explained the overview and application examples of semantic segmentation. There are various methods of semantic segmentation, and their applications vary.
It is a difficult technology to understand deeply, but if you can implement it and achieve results, you can quickly obtain a system that is one step ahead of other companies.
Since each method has unique characteristics, the optimal method differs depending on the scene and purpose of use. If you cannot set the optimal method according to the application purpose, the man-hours and costs of annotation, which affect the precision of the AI system, will also change significantly.
Because it is an important method widely used in the field of image analysis, understanding the advantages and disadvantages of semantic segmentation allows for more efficient use.
In addition, What is image recognition? Types, mechanisms, AI development processes, case studies and key considerations. provides a detailed explanation of image recognition types, application examples, and how to build models. Reading it together will further deepen your understanding of this article.