German court allows non-profit LAION to scrape copyrighted images for AI training
A Hamburg court has ruled that LAION, a non-profit organization, can collect copyrighted images for training AI systems without getting permission from a photographer. The decision leaves the most interesting question unanswered.
In a case between a photographer and LAION, the Hamburg Regional Court sided with LAION (case number 310 O 227/23). The nonprofit, specializing in creating data sets for AI training, had taken an image from a photo agency’s website, paired it with a description, and added the URL and description to its freely available “LAION-5B” dataset of 5.85 billion image-text pairs. The photographer sued LAION for copyright infringement.
The court confirmed that downloading and processing the image constituted a copyright-relevant reproduction. However, it ruled this action was justified under Section 60d of German copyright law, which permits text and data mining for non-commercial scientific research.
The court focused on LAION’s specific actions, not its organizational structure. Since LAION released the dataset freely for research, it wasn’t pursuing commercial goals. The fact that companies also use the dataset didn’t matter.
Ad
Fair use issue still unresolved
The court didn’t need to decide if LAION could also use Section 44b, a more general exception for text and data mining. This section allows copying legally accessible works for text and data mining, which is defined as automated analysis of digital works to extract information about patterns, trends, and correlations. Copies must be deleted when no longer needed for mining.
However, rights holders can reserve these uses, but only if done in machine-readable form for online works. The court doubted the photo agency’s website had such a machine-readable notice restricting use. Given the importance of the case, the photographer is likely to appeal to a higher court.
The ruling shows that research groups can collect AI training data. But it’s unclear whether this applies to for-profit companies, and it’s only about collecting the data, not actually using it to train AI systems. Companies like OpenAI have done both: they’ve taken copyrighted online data without permission and used it to train their systems.
There are a number of lawsuits pending on this issue, the most high-profile of which is probably the one between the New York Times and OpenAI.