EOS

In 2006 Georgetown University created EOS, a vast unstructured archive of publicly available open-source media articles now numbering over 600 million items.  New articles can be added at the rate of more than 300,000 per day by automated scraping of over 22,000 internet-based sources in 46 languages across the globe. Open-source data such as that held by the EOS archive contains latent indicators that can be used to provide early warning and situational awareness of important events – including forced displacement, bio-events, disease outbreaks, instances of financial fraud, and cyber security attacks. These indicators are hidden within large volumes of data from various complementary sources, including traditional online news agencies, blogs,government documents and other online sources. To search articles relevant to the surveillance topic, analysts employ a customized information retrieval system, using Boolean query strings that have been developed and refined over time for different topics. Analysts code information and write event reports based on media articles according to specific sets of concepts and keywords (i.e. a taxonomy of media reporting) relevant to the topics under review. In various funded projects, Georgetown is actively developing various automated methods for helping analysts retrieve, analyze and report on information from this rich, but unstructured archive. For an example, see the topic of forced migration.