The development of artificial intelligence has made people gradually realize that the big data that algorithms rely on are not neutral. They are drawn from the real society, and they are bound to bear the traces of inequality, exclusion and discrimination inherent in society.
As early as the 1980s, St. George’s Medical College in London used computers to browse enrollment resumes and initially screened applicants. However, after four years of operation, it was discovered that this procedure would ignore the applicant’s academic performance and directly reject female applicants and applicants without European names. This is the earliest case of gender and race bias in the algorithm.
Today, similar cases continue to appear. ImageNet, a well-known data set used to train image classification AI models, has been pointed out by many researchers.
In order to solve the above problems, two researchers from Carnegie Mellon University and George Washington University analyzed and tested two well-known unsupervised computer vision models, iGPT and SimCLRv2, based on the ImageNet 2012 database, to find whether they have inherent biases, and It is quantified.
The final answer is yes, and these prejudices are very similar to the prejudices that exist in human society. For example, men are related to career and women are related to family. Even in one test, 52.5% of the complementary pictures for female pictures consisted of bikinis or low-cut tops.
Since the images in the ImageNet dataset are all from the Internet, in the collection process, minority groups are naturally under-represented. For example, “wedding” is a pleasing scene, but compared to blacks, whites appear more frequently in wedding pictures, and the model will automatically tend to regard it as high affinity, despite unsupervised learning. The pre-training process did not give the wedding any white-related labels.
It can be said that the results of the algorithm’s precise evaluation of the relevant action costs and rewards of each object will cause some objects to lose the opportunity to obtain new resources. This seems to reduce the risk of the decision-maker itself, but it may mean that the evaluation The object’s injustice.
To create a harmonious era of big data, minimizing prejudice and discrimination is an unavoidable problem. Using technological neutrality to package social injustice is the greatest malice in the machine age