NTENT Chief Technology Officer Dr. Ricardo Baeza-Yates Describes the Internet as a Living Organ

In an interview titled “The Internet is a Living Organ” with Alois Pumhösel of derStandard.at, NTENT CTO, Dr. Ricardo Baeza-Yates identified the distorted reality plaguing the Web in our social media, news portals and search engines tailored to our individual preferences. Here he shows us a way out of a world of fake news and filter bubbles.

STANDARD: No one is objective. Everyone has prejudices and biases. They say that the network reinforces this kind of “bias.” In which way?

Baeza-Yates: There are different types of biases. Many people worry about fake news, but that is the simplest case of bias on the Web, also comparatively easy to identify. False content is only the tip of the iceberg. Many types of bias that exist are unknown to most people, this is what I worry about. A bias can be linked to the type of presentation and the way we interact with content. It can also already be encoded in the algorithms.

STANDARD: What are examples of these types of bias?

Baeza-Yates: One of the biggest problems is bias due to presentation. For example, the bread you choose to purchase at the supermarket is limited to the selections of bread your supermarket chooses to have available. Consumers never have every existing option of any product available. This is also the case with the Web. An online streaming service can only show me a very limited selection. Many videos are not presented – not because they are bad, but because not enough information about them is available. If a video is seen by only a few people, it is difficult to determine whether it is good. Similarly, to life where the rich seem to get richer, on the Web, popular videos are becoming more and more popular.

STANDARD: Are you talking about filter bubbles?

Baeza-Yates: Bias through presentation means that a system cannot learn everything, cannot show everything. Filter bubbles are a consequence of personalization. They show the users what they like best. But they cannot show anything new; nothing you may like, but have never seen. The question is: How can you find something new only on the basis of your own knowledge of the world?

STANDARD: What are the strategies to break the filter bubbles on the Web?

Baeza-Yates: There are at least three approaches; the most obvious is to increase diversity. This would be where the system knows a certain user likes A, but it also shows B, C, and D. The second approach, referred to in English as “serendipity,” allows for random discoveries when the referral algorithm looks for something that may be related to the user’s preferences. The third and most extreme approach is to show something that is the opposite of your own preferences. Even if the user does not like it, he or she might want to know that it is there.

STANDARD: Social media also want to be more personalized in the future and provide weightings according to individual preferences. How do you prevent filter bubbles from occurring?

Baeza-Yates: One way is to give the user appropriate choices. You could provide buttons that send change your experience, such as, “Show me more diversity,” “Surprise me,” or “Show me the dark side!” In this way, people can expand their horizons.

STANDARD: You said the biases are not only in content, but also in the algorithms?

Baeza-Yates: Here the problem is more subtle. Algorithms use data to achieve a solution. If the data contains bias, so is the solution. But the algorithms can further amplify the bias. For example, in photo databases, a machine learning system can help the user to tag images. It suggests that a dog, a pet or a puppy can be seen. Users will enjoy this feature, but that it’s also the problem. After a short time all the keywords will be suggested by the algorithm and not the people. However, the only way the system can learn is through new content that comes from people. It is much better to leave tagging to humans and to use such a system as a basis for a search algorithm.

STANDARD: Do you have other examples of bias in algorithms?

Baeza-Yates: A large proportion of the people using a search engine click on the first result, even if result number three might be better. An algorithm determines the position and associated ranking bias. It is similar when you get one product result in an online store, but you have to scroll down for another. There’s also the potential for social bias. Let’s say a product has more positive reviews, but another product is cheaper. I choose based on these reviews, although they may be based on fake recommendations.

STANDARD: How can search engines rate the quality of the found content?

Baeza-Yates: This is a difficult problem. Many attributes are weighed to filter out the best results. But the most important input is the human. The system believes that when many people click on something, it must be the best result. This problem must be taken into account. So you have built-in search engines doing something called Debiasing. If a new search result is clicked, even though it is placed further backwards, it will be weighted differently than a previous popular result.

STANDARD: How can you prevent the rapid spread of false messages?

Baeza-Yates: False information spreads in many ways. Often, false content from the Web is reproduced, for example, in blogs that are used for the research of others. Ten years ago, we showed that a third of Web content was made up of the other two thirds. Today it is perhaps up to a half. The Internet is a living organ that reproduces itself and thus produces false or bias-infected content. The only ones who can control that are people. According to that principle, people should know better than to trust false content. However, this is contradicted by many democratic elections along with many other examples where the majority was not always right.

STANDARD: Many blame the Internet forums and social media channels that make the distribution possible. Do you agree?

Baeza-Yates: Yes, there are complaints and even legal cases. But I should not sue the telephone company if someone insults me on the phone. People understand the phone as a communication medium, but does not for the Web. The Web is the best communication medium we have created, do not blame the messenger!

About Dr. Ricardo Baeza-Yates: As CTO, Dr. Ricardo Baeza-Yates oversees the technical vision of the company. Prior to NTENT, he spent 10 years at Yahoo! ultimately rising to Vice President and Chief Research Scientist. He has also served as a professor at Universidad de Chile since 1985, where he founded and acted as the Director of the Center for Web Research and twice served as Computer Science Department Chair; as well as professor at Universitat Pompeu Fabra in Barcelona since 2005 where he founded and acted as director of the Web Research Group. Ricardo is an ACM and IEEE Fellow with over 500 publications, tens of thousands of citations, multiple awards and several patents. He has co-authored several books including “Modern Information Retrieval”, the most widely used textbook on search.

He earned Bachelor’s and Master’s Degrees in Computer Science and Electrical Engineering from the University of Chile and a Ph.D. in Computer Science from the University of Waterloo.