• How to fight misinformation with the support of machine learning tools?

    How good are we at telling fact from fiction on the Internet? Admittedly, it can be difficult at times – there’s a lot of misinformation floating out there. Some sites and blogs routinely present opinions as facts to score quick political points, others use misleading headlines to trick us into clicking and sharing content and yet, others will flat out lie to us, suggesting that goji berries, green coffee beans or some other “weird trick” will magically burn off 50 pounds of belly fat without us needing to exercise.
    The rise of social media has created a seemingly unstoppable force of misinformation. Propaganda pushed through state-sponsored channels is disinformation, but the content in our social media feeds shared by friends is misinformation. Misinformation can be described as information that is unintentionally false, i.e. the person who is disseminating it believes that it is true. While new technologies accelerate our ability to communicate with each other, they also accelerate the spread of misinformation.
    Contemporary social media platforms offer a rich ground for the spread of misinformation. Combatting its spread is difficult for two reasons: the profusion of information sources, and the generation of "echo chambers." The profusion of information sources makes the reader's task of weighing the reliability of information more challenging, heightened by the untrustworthy social signals that go with such information. The inclination of people to follow or support like-minded individuals leads to the formation of echo chambers and filter bubbles. With no differing information to counter the untruths or the general agreement within isolated social clusters, the outcome is a dearth, and worse, the absence of a collective reality, some writers argue.
    As the world gets ready to tackle fake news, technology has set the trend by showing us how to identify and tackle it. Here are some ways to mitigate the spread of misinformation with the power of machine learning.
    1. News Quality Scoring
    The powers of machine learning could be leveraged in combating misinformation by building a quality tag system capable of determining the trustworthiness of websites. To achieve this, a publisher presents its stories to the news quality scoring platform, which then assesses the content to come up with a global score for quality. This process would be done at scale, automatically, and using machine learning algorithm. A crucial part of the quality tag system is labeling the dataset, i.e., thousands of news articles. The process will be both automated and rely on collaborative filtering.
    The news quality scoring platform would rely on a combination of two models to carry out its task. The first model involves two sets of “signals” to assess the quality of journalistic work: Quantifiable Signals and Subjective Signals. Quantifiable Signals are collected automatically. These signals include the structure and patterns of the HTML page, advertising density, use of visual elements, bylines, word count, readability of the text, information density (number of quotes and named entities). Subjective Signals are based on criteria used by editors (and intuitively by readers) to assess the quality of a story: writing style, thoroughness, balance & fairness, timeliness, etc. (This set will be used only in the building phase of the model). — The second model is based on deep learning techniques, like "text-embedding" in which texts from large volumes of data (millions of articles) are converted into numerical values to be fed into a neural network. This neural network returns the probability of scoring, and with this score, a site’s factual accuracy could be determined.
    2. Automated Facts Checking
    To fight misinformation, it is imperative to weigh facts that the news in context purports to share. Automated facts checking initiatives generally focus on one or more of three overlapping objectives: to spot false or questionable claims circulating online and in other media; to authoritatively verify claims or stories that are in doubt, or to facilitate their verification by journalists and members of the public; and to deliver corrections instantaneously, across different media, to audiences exposed to misinformation. Using artificial intelligence and machine learning, the three elements – identification, verification, and correction can be addressed.
    Real-world automated facts checking efforts begin with systems to monitor various forms of public discourse – speeches, debates, commentary, news reports, and so on – online and in traditional media. Once monitoring is in place, the central research and design challenge revolves around the closely linked problems of identifying and verifying factual claims. The best approach to this would be the reliance on a combination of natural language processing and machine learning to identify and prioritize claims to be checked. The natural language processing algorithm would go through the subject of a story, headline, main body text and the geo-location. Further, artificial intelligence will find out if other sites are reporting the same facts. In this way, facts are weighed against reputed media sources using artificial intelligence. Probabilistically, using machine learning, the system would be able to analyze a news story against a database of information, facts or past events and give some indicator signals whether the published news/content needs to be double-checked or not.
    3. Predict Reputation
    Even before eyeballs capture news items, knowing the reputation of the source sharing the news will do a world of good to nip fake news problem in the bud. A reference to the Wall Street Journal would raise no doubt about the reputation of a news source. This becomes stronger when it is compared with another source that is unknown. By creating a machine learning model, it is possible to determine the authenticity of a website and predict a website’s reputation, considering features like domain name and Google/Alexa web rank.
    4. Discover Sensational Words
    When it comes to news items, the headline is the key to capture the attention of the audience. It is for this reason that sensational headlines become a handy tool to capture readers’ interest. When sensational words are used to spread fake news, it becomes a lure to attract more eyeballs and spread the news faster and wider. By using keyword analytics, machine learning can be instrumental in discovering and flagging fake news headlines.
    Misinformation can have devastating outcomes and the most unfortunate fact is that it spreads more quickly and widely and is more engaging or appealing to the viewers. This is because in the online world, content choices are saturated, and the users have a limited attention span. Spreading misinformation, thus, has become so prolific that it is now nearly impossible for humanity to dig itself out of the quagmire. The last resort is to devise machines to pull us out. Machine learning techniques with the support of Artificial Intelligence have the capability to separate the good from the bad through pattern recognition that facilitates learning behaviors from past occurrences. Algorithms can be devised around these patterns to help in weeding out the false from the truth. Thus, machine learning tools as listed above can be devised to fight the spread of misinformation.

    References
    https://aboutbadnews.com/about-fake-news
    Wikipedia/misinformation
    http://aclweb.org/anthology/W18-5502
    https://www.forbes.com/sites/charlestowersclark/2018/10/04/can-ai-put-an-end-to-fake-news- dont-be-so-sure/#18d9bdf72f84