Skip to content

Does generative AI constitute copyright infringement? Explaining the problematic conditions, examples, and countermeasures!

 

 


In recent years, as the utilization of generative AI has progressed rapidly, discussions regarding copyright have begun to take place.

Many generative AI models learn from images and text available on the web. Since the output is based on training data, it may constitute copyright infringement, which could unexpectedly damage a company's credibility.

In this article, we explain in detail the conditions that lead to copyright infringement by generative AI, examples of infringement, and points to avoid them based on copyright law. This is a beneficial article for business owners considering the utilization of generative AI to understand the risks involved.

 

Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.

 

 

 

1. What is the Subtle Relationship Between Generative AI and Copyright?

 

 

In copyright law, a balance between the protection of the rights and interests of copyright holders and the smooth utilization of works is important. However, with the emergence of generative AI, conventional concepts of copyright have become more complex.

The learning process, generated outputs, and data utilization of generative AI involve many copyright-related challenges.

Below, we explain the subjects of copyright protection and the points of contention that should be understood regarding the relationship between generative AI and copyright.


Works Subject to Copyright Protection

Copyright protects creative works such as the following; it does not include mere facts or data, nor ideas themselves such as art styles or painting styles.

 

Included in Works Not Included in Works
  • Literary works (novels, essays, articles, etc.)
  • Musical works: Songs and scores with written lyrics or composed music
  • Stage works (theater, dance, etc.)
  • Artistic works (paintings, sculptures, etc.)
  • Architectural works
  • Cinematographic works
  • Program code
  • Mere facts or data: Objective facts such as dates, place names, or demographics
  • Commonplace expressions: Very common expressions or phrases, for example, daily greetings like "thank you"
  • Ideas themselves: The ideas that form the content or style of a work, such as art styles or painting styles
  • Industrial products: Mass-produced machines or the products themselves


If a work falls into the category of the works mentioned above, incorrect use may lead to copyright infringement.

However, even among items included in works, not all are subject to copyright infringement. Therefore, it is important to understand the difference between works and non-works when using generative AI.

 

"Similarity" and "Reliance" Become Points of Contention

Whether output content such as images or text created by generative AI infringes on the copyright of existing works depends on "similarity" and "reliance" as important points of contention.

"Similarity" refers to the output being identical to or resembling another's work. More strictly speaking, the issue is whether the output allows one to directly perceive the "essential features of the expression" of the existing work.

However, having partial similarities does not necessarily mean similarity is established; it is crucial whether the essence of the creative expression of the work is common.

On the other hand, "reliance" refers to whether the output is based on someone else's work. In the Agency for Cultural Affairs' "Views on AI and Copyright," cases where reliance is recognized include the following:

 

  • The AI user was aware of the existing work: Such as when the existing work was directly input or a specific work was instructed
  • The AI user was not aware of the existing work, but the work was included in the generative AI's training data: However, it may be an exception if it is technically guaranteed not to be output


If a work created by generative AI satisfies "similarity" or "reliance," it may be judged as infringing on the copyright of an existing work.

Since generative AI works by producing output based on training data, similarity to original data easily occurs, and it is considered to have a high risk of copyright infringement.

 

Can You Claim Copyright for Images Output by Generative AI?

To state the conclusion first, copyright generally cannot be claimed for works generated autonomously by generative AI. This is because, under copyright law, copyright arises for "creative expressions of thoughts or emotions," as seen in the following definition.

 

Definition of a "Work" in Copyright Law

A work is something in which (1) thoughts or emotions are expressed (2) in a creative way (3) and which (4) falls within the scope of literature, science, art, or music. (Reference: Copyright Act, Article 2, Paragraph 1, Item 1)

Images, text, and music generated by AI are not direct expressions of human emotions or intentions but are generated by AI algorithms. Therefore, it is common to judge that they do not fall under the above conditions.

However, if a human has made a specific creative contribution to the AI's generation process, copyright may exceptionally be recognized. Creative contribution may be evaluated for outputs where humans are actively involved, such as the examples listed below.

 

  • Gave detailed instructions regarding the method of expression of the generated item
  • Repeated trials of instruction content while checking the outputs

 

However, even if an AI output is recognized as a work, caution is necessary as it may still infringe on existing works' copyrights if "similarity" or "reliance" regarding those existing works is recognized.

Therefore, when claiming copyright for generative AI works, it is important to thoroughly consider "to what extent human creative contribution exists in which part" and "whether the output relies on existing works" to avoid risks of copyright infringement.

 

2. [By Scene] Conditions for Copyright Issues in the Development and Use of Generative AI

 

 

The conditions under which copyright issues arise in major scenes, such as during the development or use of generative AI, are explained separately for the following scenes.

 

  • When using training data for generative AI systems
  • When using generative AI systems
  • When using generative AI output as training data for other AI systems

When using training data, whether or not copyright infringement occurs depends on whether the purpose of generative AI model development falls under "enjoyment" acts. As is often misunderstood, whether it is for-profit or non-profit is not the determining factor.

"Enjoyment" refers to acts for the purpose of a user obtaining intellectual or mental satisfaction, such as viewing videos or images or reading text.

Using works for AI model development or collecting/reproducing works is considered as "for the purpose of information analysis" as defined in Article 30-4 of the Copyright Act below and is deemed an act not intended for "enjoyment." Therefore, it can be interpreted that they can be used without permission.

A work may be used in any way, in principle, when it is not for the purpose of enjoying the thoughts or emotions expressed in the work or having others enjoy them, such as for the purpose of information analysis. (Reference: Copyright Act, Article 30-4)

However, if there is a purpose to have the work output exactly as it is at the learning stage, it may be judged that the purpose of enjoyment coexists.

Additionally, if copyright infringement by an AI output occurs, responsibility is generally borne by the user who generated the content using the generative AI. However, if outputs that infringe on copyrights are produced frequently, or if the developer is recognized as having failed to take appropriate suppression measures, the service provider may also be held responsible.

Therefore, when using generative AI systems, it is important to understand the scope of responsibility regarding copyright infringement and use them with awareness of compliance with copyright law.


When using generative AI systems

At the usage stage of a generative AI system, in addition to whether "similarity" and "reliance" regarding the outputs mentioned above are recognized, an important point in judging copyright infringement is whether it falls outside the right limitation provisions of the Copyright Act.

For example, publishing a generated image similar to an existing work on a corporate site or company SNS may be judged as copyright infringement.

Therefore, when disclosing content produced by generative AI to the public, it is important to confirm in advance "to what extent the output has originality relative to existing works" and "whether the usage method does not fall under right limitation provisions."


When using generative AI output as training data for other AI systems

Risks of copyright issues accompany cases where content such as text, images, or audio generated by generative AI like ChatGPT is used as training data or data for annotation (including supplementation or expansion) for other AI systems.

Even for non-profit or research purposes, the act of creating datasets with generative AI for the expansion of annotation data does not necessarily mean it does not fall under "enjoyment." It is necessary to make a comprehensive judgment based on the purpose and method of use, the nature of the generated data, and other factors.

However, if it is judged as being for the purpose of enjoyment, permission from the copyright holder is required.

In addition to the question of enjoyment, some generative AI systems set specific license conditions regarding the use of their outputs or restrict commercial use.

Furthermore, it has been observed that training a new AI using data generated by AI causes the quality and diversity of the data to deteriorate rapidly within a few learning cycles, resulting in nonsensical content. A possible cause is that content generated by AI may contain errors or biases from the original model.

 

Read also:
"What is training data? How is it different from learning data? How much do you need? We explain how to create it in-house or outsource it, and what to be careful of when collecting it!"

 

Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.

 

3. Case Examples Where Copyright Infringement Actually Became an Issue in Generative AI

 

 

As generative AI is used in various ways across different countries, cases of copyright infringement resulting from the use of generative AI in inappropriate ways are increasing. Below, we introduce case examples where copyright infringement actually became an issue in generative AI.


Damage Claims Recognized Against AI Generating Images Similar to Ultraman

A Chinese court ordered an AI service provider to pay damages, judging that an image output by an image-generating AI was similar to "Ultraman" and constituted copyright infringement.

This case shows that "to what extent an AI output resembles an existing work" and "how the output reflects expressions protected by copyright" are important points in judging copyright infringement.

It also suggests that providers of generative AI may bear responsibility if they do not take appropriate management and suppression measures to prevent copyright infringement.

Reference: https://www.yomiuri.co.jp/culture/subcul/20240415-OYT1T50069/


Lawsuit Developed Over Image Generation AI Using Copyrighted Photos Without Permission

Getty Images, which operates a photo and image stock site in the United States, filed a lawsuit for copyright infringement against Stability AI, which developed the image generation AI "Stable Diffusion."

Getty Images claims that Stability AI used approximately 12 million items of material, including copyrighted images, as training data for its AI model without permission.

This lawsuit has content that impacts copyright issues when AI utilizes existing works as training data.

Reference: https://japan.cnet.com/article/35199679/

4. Key Points for Avoiding Copyright Issues When Using Generative AI

 


When utilizing generative AI, you can avoid copyright issues by preparing for risks in-house. Below, we introduce key points for avoiding copyright issues when using generative AI.


Outsource to Reliable AI Vendors

A reliable vendor will be fully aware of the legal risks regarding generative AI related to copyright and can take appropriate measures to avoid the risk of copyright infringement related to generative AI. For example, they can prevent in advance the use of data or learning processes that would infringe on copyright and manage things so that only lawful data is utilized.

Therefore, the risk of a company unintentionally becoming a perpetrator of copyright infringement can be significantly reduced.

Partnering with a reliable vendor is an important step for utilizing generative AI securely and quickly while suppressing legal risks in AI utilization.


Obtain Permission When Using Others' Works

When using others' works as training data for generative AI, it is necessary to obtain permission from the copyright holder.

If you want to use existing works such as another company's character designs, logos, or image data for generative AI training purposes, it is safest to obtain permission.


Promote Data Management

To avoid copyright issues in the development and operation of generative AI, it is essential to promote appropriate data management. Since copyright issues related to generative AI are greatly influenced by the management of the data used and generated, the establishment of data management leads to secure development.

For example, at the collection stage of text data, it is necessary to exclude data that has provisions prohibiting its use for AI training.

By clarifying such prohibited items and manualizing guidelines for data collection and use, the quality of data management can be improved, and the risk of copyright infringement can be reduced.

 

5. Summary

When creating content such as images or text, or augmenting data with generative AI, it is important to always be conscious of the risks regarding copyright. When similarity and reliance between an output and a work are recognized, the possibility of it being judged as copyright infringement increases.

To avoid becoming a perpetrator of copyright infringement, preliminary measures such as outsourcing annotation creation or system development to reliable AI vendors or entering into license agreements with authors are necessary.

Furthermore, generative AI is a field expected to develop rapidly in the future, and current copyright laws and guidelines may be changed accordingly. Therefore, it is important for the safe utilization of generative AI to always confirm the latest legal views and updates to guidelines from the Agency for Cultural Affairs and remain capable of responding appropriately.

 

Nextremer offers data annotation services to achieve highly accurate AI models. If you are considering outsourcing annotation, free consultation is available. Please feel free to contact us.

 

 

Author

 

nextremer-toshiyuki-kita-author

 

Toshiyuki Kita
Nextremer VP of Engineering

After graduating from the Graduate School of Science at Tohoku University in 2013, he joined Mitsui Knowledge Industry Co., Ltd. As an engineer in the SI and R&D departments, he was involved in time series forecasting, data analysis, and machine learning. Since 2017, he has been involved in system development for a wide range of industries and scales as a machine learning engineer at a group company of a major manufacturer. Since 2019, he has been in his current position as manager of the R&D department, responsible for the development of machine learning systems such as image recognition and dialogue systems.

 

Latest Articles