The case of Lee Luda has aroused the public’s attention to the personal data management and AI in South Korea.
Lee Luda, an AI Chatbot with Natural Tone
Last December, an AI start-up company in South Korea, ScatterLab, launched an AI chatbot named ‘Lee Luda’. Lee Luda is set up as a 20-year-old female college student. Since quite natural conversation was possible with Luda, the chatbot service gained a huge popularity especially within Generation Z. In fact, the service attracted more than 750,000 users in 20 days since it was launched (McCurry 2021). It seemed that Lee Luda was a success by demonstrating natural interaction with humans.
However, soon it became socially controversial due to several problems. Before taking up the main subject, we need to know how it was possible for Luda to communicate with humans so naturally.
The natural tone of Lee Luda was possible as ScatterLab collected “10 billion real-life conversations between young couples taken from KakaoTalk”, which is the most popular message application in South Korea (McCurry 2021). ScatterLab did not directly collect conversations from KakaoTalk, but took a roundabout way; in other words, in a sneaky way. There have been few counselling service applications which analyse messenger conversations and give advice about love life when the users agree to submit their KakaoTalk conversations to the apps. ScatterLab obtained data from those applications very easily.
Internal and External Problems of Luda
So, few problems came up in pursuance of collecting data. First, the users of counselling apps agreed to share their conversations with those applications, but not with ScatterLab. The users would not have known that their conversations would be used in developing an AI chatbot. Second, the applications got the users’ agreement, but not from the companions of conversations. In prior to collecting messenger conversations, there must be an agreement from every participant of conversations though.
What was worse, ScatterLab was very poor at data cleaning. It is revealed that Luda sometimes responded with random names, addresses, and even bank account numbers (D. Kim 2021). The random personal information is probably the ones extracted from the conversations submitted to counselling apps. In addition to this, ScatterLab shared their training model on GitHub, but not fully filtering or anonymising the data (D. Kim 2021). As a result, personal information was publicised since ScatterLab did not clean the data properly. It seems that ScatterLab was not conscious of data ethics at all.
There remains another problem which caused controversy over Lee Luda and AI as a whole in the beginning. When Luda was asked its opinions about social minorities, it revealed disgust towards them. For example, when a user asked Luda about LGBTQ, Luda answered, “I’m sorry to be sensitive, but I hate it [LGBTQ], it’s disgusting” (E. Kim 2021). The user asked why, and Luda added, “It’s creepy, and I would rather die than to date a lesbian” (E. Kim 2021). It is known that Luda also made discriminatory remarks towards the disabled and a certain race group. The creators of Lee Luda would not have intended to target and discriminate a certain group of people, but Luda did.
Frankly speaking, Lee Luda was built up wrongly from the beginning. First, the data needed for deep learning was inappropriately obtained; ScatterLab did not inform the data providers (counselling app users) that they would use their data in creating an AI chatbot. Second, the data was not cleaned properly; the chatbot revealed some personal information when chatting, and the company even shared the training model on GitHub not thoroughly filtering or anonymising personal data. Third, the company failed to handle or manipulate the chatbot after they launched it; Luda did not hesitate to express hatred towards a certain group of people, and ScatterLab was not aware of it.
Always Beware and Be Responsible!
Lee Luda appeared flawless at first, perhaps, less flawed than other AI chatbots. Instead, it turned out to be highly flawed. As a consequence, ScatterLab had to destroy Lee Luda, and further, to be investigated due to the violation of privacy laws and poor data handling. Due to Lee Luda case, the public began to fear AI as a whole. This is because they witnessed that an AI system can go wrong anytime — irrespective of the system builder’s intention — even though it is seemingly built well.
It is a matter of course that ScatterLab obtained data improperly and misused the data; causing the leakage of personal information and prejudicing the public against AI. Nevertheless, I would like to emphasise that both data providers and data collectors need to be responsible for the data they create, provide, collect, and use. Living in the time closely connected with internet of things (IoT), AI is inseparable from our daily life. Then what should we do to make use of AI, by keeping in mind that AI is built upon big data?
It is very common to see the users of a certain internet service are indifferent to the usage of their personal data, although they have the rights to the data. They must agree to terms of services — which states that their personal data will be collected and shared — otherwise they will not be able to use the service. Yet, they are often not aware of the terms as they simply do not read the screed or do not understand the legal terms. They would implicitly know that their personal information will be revealed or used somewhere and sometime, but they would not know the exact usage or extent of disclosure. The best way to prevent data leakage or misuse would be that individuals need to understand what kind of data they are sharing, who they are sharing with, and where the data will be used.
In addition to this, the data collectors often overlook data ethics that they need to collect and handle the data with caution. Obviously, the lack of control on the usage of data can produce negative outcomes. Thus, the data collectors must specify what kind of data they will be collecting from data providers and how they will be used. They also should have a sense that the data providers gave the right to use their data, thus the data cannot be transferred to others without agreement, and the data should be treated carefully. Furthermore, there must be legal and technical mechanisms which protect data providers’ privacy and prevent data collectors from breaching laws.
In sum, keeping data safe is not just a matter of one certain group of people, but it is a matter of everyone. By understanding how personal data should be shared, how the data one shared can be used, and what steps are needed to protect the data, we can protect our personal information and will be able to make good use of advanced technology without being counterattacked.
Kim, D. 2021, ‘Chatbot gone awry starts conversations about AI ethics in South Korea’, The Diplomat, viewed 03 February 2021, <https://thediplomat.com/2021/01/chatbot-gone-awry-starts-conversations-about-ai-ethics-in-south-korea/>.
Kim, E. 2021, ‘Chatbot Luda controversy leave questions over AI ethics, data collection’, Yonhap News, viewed 02 February 2021, <https://en.yna.co.kr/view/AEN20210113004100320>.
McCurry, J. 2021, ‘South Korean AI chatbot pulled from Facebook after hate speech towards minorities’, The Guardian, viewed 21 January 2021, <https://www.theguardian.com/world/2021/jan/14/time-to-properly-socialise-hate-speech-ai-chatbot-pulled-from-facebook>.