I met Fatma in June 2019 in Sofia, Bulgaria. Four years prior, she had been forced to leave her home in Aleppo with her whole family: her mother, father, older brother, and two younger siblings. Fatma was 17 when her parents paid the equivalent of nine thousand euros to men who smuggled the seven family members in the back of a van across landscapes and borders, until reaching Finland via Sofia. The smugglers had promised a house and a car in Finland for the sum paid, but this promise went unfulfilled. Instead, after six months, Fatma’s family was deported to Bulgaria because their “fingerprints were registered in Sofia first.” “We lost everything to have a good life because our lives were in danger,” she lamented. “Were they in danger because of the war?” I asked. “It was personal,” she replied cryptically.
Fast forward to 2019, and Fatma, now 21, was living with her family in a refugee camp in the Bulgarian capital. While assisting her father at the camp’s hairdressing salon, she also worked part-time for the data-labeling company where I was conducting fieldwork. Interestingly, she was recruited by the company at the refugee camp. Following initial training in “digital skills” and English, Fatma was ready to assume her role as a data worker. During our initial conversation, she was at the company’s office, seated alongside Diana, another Syrian asylum seeker who was engaged in labeling images of people based on race, age, and gender. In contrast, Fatma was immersed in a project that involved satellite images and semantic segmentation—a critical task for computer vision that involves the meticulous separation and labeling of every pixel in an image. This form of data work holds particular importance in generating training data for AI, especially for computer vision systems embedded in devices such as cameras, drones, or even weapons. Fatma explained that the task basically consisted of separating “the trees from the bushes and cars from people, roads, and buildings.” Following this segmentation, she would attach corresponding labels to identify each object.
Data Work Requires Skill
Explained in this manner, the work might seem trivial and straightforward. Such tasks fall under what is known as microwork, clickwork, or, as I refer to it, data work. This constitutes the labor involved in generating data to train and validate AI systems. According to the World Bank, there are between 154 million and 435 million data workers globally, with many of them situated in or displaced from the World Majority. They often work for outsourcing platforms or companies, primarily as freelancers, earning a few cents per piece or task without the labor protections, such as paid sick leave, commonly found in more traditional employment relationships. Data workers generate data through various means that range from scraping information from the internet to recording their voices or uploading selfies. Similar to Fatma, they frequently engage in labeling tasks. Additionally, data workers may contribute to algorithm supervision, such as rating the outputs of recommender systems on platforms like Netflix or Spotify and assessing their usefulness, appropriateness, and toxicity. In other instances, data workers might be tasked with plainly impersonating non-existing AI systems and be instructed to “think like a robot” while pretending to be a chatbot, for instance.
Despite its crucial role in the development and maintenance of AI technologies, data work is often belittled as micro or small, involving only a few clicks, and dismissed as low-skill or blue-collar. In fact, the platform Clickworker, a prominent provider of on-demand data work, claims on its website that “the tasks are generally simple and do not require a lot of time or skill to complete.” However, this assertion is inaccurate. During my fieldwork in Bulgaria, for instance, I attempted to segment and label satellite imagery, finding it extremely challenging. The work demands precision when drawing polygons around different objects in the pictures, which is also strenuous on the eyes and hands. Moreover, it requires contextual knowledge, including an understanding of what vegetation and vehicles look like in specific regions. Following the segmentation and labeling process by Fatma and her team, a rigorous quality check is conducted by a woman in the client’s company. Fatma’s manager in Bulgaria mentioned that the quality control person was “remarkably fast with the quality check and feedback” and added, “She’s able to do this quickly because she knows the images and the ground.” While taking note of this, I wondered how well the quality controller knows the ground. Does she come from the area where these images were taken? Is she, like Fatma, a refugee? Has her displacement been leveraged as expertise?
I asked Fatma if the satellite images she was working on could be of Syria. She said she thought the architecture and vehicles looked familiar. Staring at the screen, she whispered, “I hope this isn’t for weapons.” Neither she nor I could be certain.
The Known and the Unknown
Fatma’s fear of the satellite images being used for AI weapons is not unfounded. The proliferation of autonomous drones and swarm technologies has experienced exponential growth in recent years, facilitated by the integration of AI in reconnaissance, target identification, and decision-making processes. Illustrating a poignant example, facial recognition technologies have been utilized to uphold the segregation and surveillance of the Palestinian people, while automated weapons have played a crucial role in the ongoing genocide in Gaza. Companies like the Israeli SmartShooter boast about their lethal capabilities with the slogan “One Shot, One Hit.”
Surveillance drones, predictive analytics, and decision support systems are utilized for strategic planning in “threat anticipation” and real-time monitoring along border regions. For instance, the German Federal Office for Migration and Refugees (BAMF) employs image biometrics for identity identification and voice biometrics for dialect analysis to ascertain asylum seekers’ country of origin and evaluate their eligibility for asylum. This system purportedly recognizes dialects of Arabic, Dari, Persian/Farsi, Pashto, and Kurdish. As revealed by BAMF in response to a query initiated by German MPs, data workers subcontracted through the platform Clickworker (the same platform that claims tasks are simple and low-skill) participated in producing the voice samples required to develop the system.
Fortunately, the data company in Bulgaria has a strong policy in place to reject requests related to warfare technologies. Fatma’s manager explained that “we have rejected projects related to (…) training artificial intelligence for different types of weapon applications. So, I felt that this really did not fit with our social mission, and when I responded to the client, I said that we’re working with conflict-affected people, and that’s why (…) But it was also a kind of boycott of such projects to be developed at all.” She added that the satellite imagery labeled by the team had been commissioned by a central European firm developing autonomous piloting systems for air transportation, not weapons. This information correlates with the client’s website. However, the website also states that their technology is additionally used for unmanned aerial vehicles (UAV), commonly known as drones, with applications including surveillance.
Workers’ Ethical Concerns
Privacy infringements and the potential for discriminatory profiling are among the most obvious concerns related to AI systems applied to border surveillance and warfare. Despite these risks disproportionately affecting their own communities, sometimes with lethal consequences, most data workers are kept in the dark concerning the ultimate purpose of the data they contribute to producing. The outsourcing of data work to external organizations, often situated far away from the requesters’ geographical location, complicates workers’ efforts to navigate the intricate supply chains that support the AI industry. Instructions given to data workers seldom provide details about the requester or the intended use of the data. Consequently, most data workers do not know the name and nature of the companies seeking their services, the products that will be trained on the datasets they generate, or the potential impacts of these technologies on individuals and communities. AI companies frequently rationalize the veil of secrecy as a means of safeguarding their competitive edge.
The fact that data workers are integrated into industrial structures designed to keep them uninformed and subject to surveillance, retaliation, and wage theft does not mean that they do not have ethical concerns about their work and the AI applications it supports. In fact, there have been instances where data workers have explicitly alerted consumers to privacy-related and other ethical issues associated with the data they generate. For example, in 2022, Venezuelan data workers reported anonymously that Roomba robot vacuum cleaners capture pictures of users at home, which are then viewed by human workers.
Amid the COVID-19 pandemic in 2021, I piloted a workshop series with fifteen data workers, this time located in Syria. The three-day event was designed to understand work practices and relationships in geographically distributed data-production contexts, creating a space for workers to discuss concerns. The workshop activities revealed that receiving information and having spaces to voice and discuss the ethical implications of the data they handle were of the utmost importance to the workers. They worried about the protection of data subjects’ privacy and advocated for a mandatory clause that would compel requesters to disclose the intended uses of the data. Additionally, the workers expressed concerns about the mental health implications of working with violent, offensive, or triggering data.
Data workers possess a unique vantage point that can play a crucial role in the early identification of ethical issues related to data and AI. Encouraging consumers and society at large to align with them in advocating for increased transparency in the AI data production pipeline is essential. Workers like Fatma and her colleagues could offer valuable insights into the utilization of satellite images for surveillance technologies, for instance. Similarly, the native speakers who contributed their voices to generate audio snippets for dialect recognition may shed light on the applications of such systems against asylum seekers in Germany.
Unfortunately, the challenge lies in the fact that the AI industry, for evident reasons, has structured its production processes for data workers to function more as silent tools than as whistleblowers.