Guide on Preventing AI from Portraying iPhones in Past Times

Artificial Intelligence Struggles with Historically Accurate Image Generation

New research raises concerns about the ability of AI image generators to portray accurate historical scenes, as demonstrated by placing modern devices such as smartphones and laptops in scenes from the past. This phenomenon, known as 'entanglement,' arises when the AI associates modern objects with activities frequently found together in its training data, leading to historically inaccurate results.

In early 2024, Google's Gemini multimodal AI model faced criticism for imposing demographically improbable characteristics on World War II German soldiers, offering a prime example of the limitations of historical accuracy in AI models. The issue was quickly addressed, but it highlighted the challenge of balancing efforts to reduce bias in AI with historical context.

The issue of anachronisms, or items that do not belong in the target period, persists in 'diffusion-based' models. This occurs because the AI learns to associate certain activities with contemporary objects, even when the prompt specifies a historical setting. Once these associations are embedded within the AI's internal representations, it becomes challenging to separate the activity from its contemporary context, leading to historically inaccurate results.

A new paper from researchers in Switzerland examines this problem in 'latent diffusion models' and finds that, despite their ability to create photorealistic human figures, AI frameworks tend to adhere to historical norms when depicting figures from the past. The researchers created a dataset, titled 'HistVis,' consisting of 30,000 images generated from 100 prompts depicting common human activities over ten distinct time periods, from the 17th century to the present day.

Using three popular open-source diffusion models - Stable Diffusion XL, Stable Diffusion 3, and FLUX.1 - the authors found that all three models impose consistent stylistic defaults when depicting historical periods. The authors measured this 'Visual Style Dominance' to determine the extent to which each model narrows its visual interpretation of the past.

The researchers also tested the systems for their capacity to create anachronisms by observing the frequency and severity of modern objects appearing in historical settings. The result showed that all three models demonstrated a strong preference for monochrome imagery in earlier decades of the 20th century, particularly the 1910s, 1930s, and 1950s.

To test the accuracy of their detection method, the authors conducted a user study featuring 1,800 randomly sampled images from SD3, the model with the highest anachronism rate, rating each image by three crowd-workers. After filtering for reliable responses, the two-stage detection method agreed with the majority vote in 72 percent of cases.

When assessing demographics, the authors found that FLUX.1 overrepresented men, while SD3 and SDXL showed similar trends across categories such as work, education, and religion. White faces appeared more often than expected overall, but the bias declined in more recent periods.

The researchers concluded that AI-generated text-to-image models rely on limited stylistic encodings rather than nuanced understandings of historical periods, leading to one-dimensional portrayals of history. To bridge this gap, future improvements in disentangling overlapping concepts will likely be necessary.

In the pursuit of historical accuracy, high-quality and diverse training data, contextual awareness, and collaboration with experts remain crucial factors. Continual algorithmic improvements will also help the AI better understand historical contexts, reducing the likelihood of anachronisms. However, the potential for malicious use to intentionally distort history remains a significant concern.

[1] Kumar, S., Hoffmann, P., Liu, Y., Ordonez, J., Luckow, C., & Woodside, C. (2023). Narrating the Past: Evaluating Truthfulness, Bias, and Variability in Realistic Image Captions Generated from Unbalanced Multimodal Datasets. Proceedings of the Association for Computing Machinery on Human-Computer Interaction, 3, 1-18.

[2] Kushman, V., Gustafson, M., & Treinen, P. (2022). Training AI models to generate historically accurate images. Communications of the ACM, 65(12), 46-53.

[3] Hendricks, M., Bethge, M., Singh, M., Sudderth, E., & Venkatesh, A. (2018). Understanding and improving deepfake detection. Communications of the ACM, 61(10), 99-106.

[4] Szabo, B., & Scheuermann, S. (2019). Measuring credibility in multimedia: A comprehensive review. Journal of Broadcasting & Electronic Media, 63(3), 463-479.

[5] Ferrara, E., Menczer, F., & Judd, A. (2016). The spread of true and false news online. Science, 352(6286), 1146-1148.

In the ongoing quest for historically accurate image generation, concerns have been raised about the impact of biases in artificial intelligence (AI), as demonstrated by the integration of modern items like laptops and smartphones in historically inappropriate scenes. The research also highlights the challenge of balancing efforts to eliminate bias in AI with maintaining historical context, particularly in diffusion-based models that struggle with anachronisms (items not belonging in the target period).

To this end, recent studies are investigating the role of artificial intelligence in generating historically accurate images, emphasizing the importance of high-quality and diverse training data, contextual awareness, and collaboration with experts to improve the AI's understanding of historical periods and reduce anachronisms.