Synthetic Data:
- How will synthetic data impact AI development and deployment in 2024 and beyond?
The increasing demand for synthetic images and training data in 2024 is driven by restrictions on real-world images. With the monetisation opportunity and decreasing availability of real-world images, there’s a significant shift toward considering the lifespan of existing images and acquiring additional images.
Synthetic images and scenario videos are a crucial solution to this problem, providing the ability to create images without rights and avoiding GDPR/CCPA privacy issues. We expect a steep increase in interest in synthetic images and training data in the coming year.
- What are the key challenges and opportunities you see with synthetic data and its impact on AI applications in 2024?
A key challenge is the acceptance of synthetic data, but there has been a shift in the right direction as users witness real results. Synthetic data has improved massively, even in the last 18 months there have been huge improvements in its realism. Convincing organisations and governments of its validity remains somewhat difficult, requiring an explanation of what exactly it can do and do well. This remains difficult as some businesses may have used older versions of synthetic data and do not know what its new capabilities are.
Opportunities are widespread, as synthetic data replaces the need for massive amounts of real data while maintaining privacy. The challenge lies in persuading stakeholders to embrace new approaches instead of sticking to the status quo.
- Can you provide your thoughts on the potential breakthroughs in synthetic data utilisation that we can expect to see in 2024?
Potential breakthroughs in the usage of synthetic data are expected to be widespread, driven not only by increased demand but also by government legislation such as the EU AI Act.
Government regulations may force a change toward alternative data sources. Also, the continuous improvement in the quality of synthetic images is a significant factor, with synthetic data becoming increasingly realistic over the years. I believe the way the development is currently going suggests that by 2024, some synthetic data may be indistinguishable from real-world images.
- How will the adoption of synthetic data impact data privacy and security concerns in the new world of AI, and what solutions might emerge?
The adoption of synthetic data addresses concerns related to the rights and privacy of real-world data. Challenges do come up when collecting real-world data, especially when filming in public spaces, necessitating model releases and approvals.
Legislative processes, like the EU AI Act or President Biden’s Executive Order, further complicate real-world data collection. Synthetic data offers a solution by being inherently privacy-compliant, enabling rapid and cost-effective data generation. Additionally, it plays a crucial role in testing models, especially for tasks like ID verification, where synthetic data allows testing against false information.
- What industries or sectors do you think will benefit the most from the use of synthetic data in their AI initiatives in the coming year?
Industries, particularly those relying on foundation models like ChatGPT models, will benefit significantly from synthetic data. With legal battles affecting the availability of real-world data, synthetic data becomes a powerful tool for tuning models for specific marketplaces.
Sectors such as Smart City initiatives face challenges in obtaining diverse and specific data, making synthetic data invaluable. There’s a significant demand for smart spaces and an emerging interest in dangerous use cases, such as identifying people floating in the water.
The automotive industry will benefit significantly, particularly when it comes to safe testing. This is where AI can help massively with scenarios that may have previously not been able to have been tested in a controlled environment.
- When will real and synthetic data be indistinguishable?
In specific use cases, like 2D faces for ID verification, real and synthetic data are already indistinguishable. I believe that general photo-realism will be achieved by 2025, with certain use cases achieving indistinguishability in 2024.
Animation, predicting human actions, might take until 2025 due to the need for a more photorealistic environment. While individual items can already be seen to be indistinguishable, achieving overall scene complexity –whereby there are multiple synthetic visualisations at play – may take another two to three years.
- Will synthetic data and generative data get closer or further diverge?
Synthetic data and generative data are expected to develop different use cases. While generative data might not stand alone for training AI networks, it can still play a role in specific scenarios. Synthetic data and generative data may intertwine and cross over in certain situations, like using synthetic data to train generative data or incorporating generative data as part of synthetic data. However, they are likely to remain alone with unique benefits for specific use cases.
AI Questions
- What key AI developments do you expect in 2024?
In 2024, the focus will be on transforming models, with increased specialisation for specific market requirements. Large language models like ChatGPT will evolve into new generations, becoming more specialised for particular use cases. There will be a big uptick in AI content usage for visual applications, such as advertising and news articles, generated by improved generative AI models.
Also the current racial bias in AI will likely reduce. Right now, if you ask an AI model for a picture of a man, 90% of pictures shown will be of white men. AI models have to become more reflective of the world to continue to keep up.
- How will AI continue to impact our daily lives, from smart devices to healthcare, in the coming year?
In healthcare, we will see the biggest changes. There will be a significant movement toward people using wearable devices that generate data for analysis. Patients will present doctors with gigabytes of data from these devices, changing the way medical services react and diagnose.
On top of this, the acceptance and usage of wearable devices will reshape healthcare practices. People will go to their doctors already armed with information about their condition. Additionally, increasing the power and performance of edge devices, like smartphones and smartwatches, will improve accuracy and interactions with AI, particularly in situations like smart doorbells.
- In the continuing wave of AI, what new problems and ethical issues might come up, and should we establish regulations to address them?
As AI continues to advance, new problems and ethical concerns will come up. The need for regulations is becoming clear, which we have already seen towards the end of 2023 like Biden’s Executive Order on AI’s usage and Rishi Sunak’s AI plan.
Ethical considerations in AI pose challenges, as navigating the ethical landscape can be subjective and does not have a clear rule book. Ultimately, the decision to regulate AI will be with governments and lawmakers.