Homepage of Ansgar Scherp: About Me (http://ansgarscherp.net/#me)

Menu: [Overview] • [About Me] • [Projects] • [Publications] • [Dissertation] • [Habilitation]

Education and Professional Experience
Research Interests
Honors and Awards
Community Service
Entrepreneurship
Keynote Talks
Invited Talks
Most Important Publications
Supervised PhD Theses
Supervised Master Theses

Education and Professional Experience

Image showing a Portrait of Ansgar Scherp

Ansgar is Full Professor for Data Science and Big Data Analytics with Ulm University, Germany. Prior, he worked as Professor of Natural Language Processing and Data Analytics and was member of the interdisciplinary Language and Computation group with the University of Essex, England, UK. Ansgar was a Associate Professor for Data Science and Predictive Analytics with the University of Stirling, Scotland UK, from August to December 2018. He had a fixed-term Professorship of Knowledge Discovery at Kiel University (W2 level, roughly equivalent to associate professor) and ZBW—Leibniz Information Centre for Economics in Kiel, Germany from January 2014 to July 2018. In Kiel, he was scientific leader of the EU Horizon 2020 project MOVING, enabling young researchers, decision makers, and public administrators to employ and use machine learning and data mining tools to search, organize, and manage large-scale information sources on the web such as scientific publications, videos of research talks, social media, etc. Before joining Kiel University, Ansgar was Juniorprofessor (W1 level, roughly equivalent to assistant professor) at the University of Mannheim, and Postdoctoral research associate as well as Juniorprofessor at the University of Koblenz-Landau in Coblenz, Germany. In Coblenz, he was work package leader for the EU FP 7 projects—WeKnowIt and Social Sensor. Previously, Ansgar acquired a prestigious Marie Skłodowska-Curie Fellowship of the EU for a 1-year research stay at the University of California at Irvine, California.

Ansgar has an excellent research reputation in Text and Graph Mining, specifically in the combination of symbolic and subsymbolic (statistical) methods for data analysis. He has won the Billion Triples Challenge at the International Semantic Web conference in 2008 and 2011. The goal of the Billion Triple Challenge is to demonstrate scalability of semantic technologies. Ansgar is elected speaker at the ACM SIGMM Rising Stars Symposium of the Special Interest Group on Multimedia (SIGMM) of the Association for Computing Machinery (ACM) that was held in October in Amsterdam honoring his 10 years of research in metadata mining and semantics. He published over 150 peer reviewed conference papers and journal articles.

Research Interests

My research interests are in novel approaches for data analysis by combining symbolic and statistical methods. I bring together methods from Information Retrieval, Data Mining and Machine Learning, and Semantic Web. I apply my novel data analysis approaches to, e. g., very large, distributed Knowledge Graphs on the web with billions of edges or large-scale document corpora in domains like life sciences/medicine, social sciences, economics, and the web.

I contribute to multiple research areas. As contribution to combining Machine Learning and Semantic Web, I have used and compared methods like Association Rules and Learning to Rank to provide a tool for modeling semantic data on the web [see CV: C54, C53]. I have used regression models to keep data caches up-to-date based on predicted data changes [see CV: C58] and have analyzed the evolution of Knowledge Graphs with logistic regression models and random forests for the purpose of change verification [see CV: C61]. I have also investigated classical and modern machine learning methods to compare text classification into a semantic thesaurus by using only the titles vs. the full-text of documents [see CV: C60]. In a work from January 2018, I managed to show that modern Deep Learning methods applied to a very large number of titles of scientific documents can yield competitive or even better classification results compared to using the full text [see CV: C63].

Regarding Information Retrieval and Semantic Web, I have developed a novel profiling method called HCF-IDF that combines the statistical strength of the popular TF-IDF model with the semantics of domain-specific thesauri [see CV: C55]. With HCF-IDF I have demonstrated in an online study with n=123 economists that one can provide scientific paper recommendations based on only the titles of the publications that is competitive compared to using the full-text. In addition, I have developed, with SchemEX, an approach for a stream-based computation of a schema-level index for very large distributed graph data [see CV: J12]. The index can be used to search the web for specific data sources just like Google for web documents [see CV: C27]. The idea of a stream-based computation of an index over graph data won the Billion Triple Challenge of the International Semantic Web Conference in 2011. The goal of the Billion Triple Challenge is to demonstrate scalability of semantic technologies.

Honors and Awards

F. Singhofer, A. Garifullina, M. Kern, and A. Scherp: A Novel Approach on the Joint De-Identification of Textual and Relational Data with a Modified Mondrian Algorithm, Symposium on Document Engineering (DocEng), ACM, 2021. (Best Paper Award)
M. Jessen, F. Böschen, and A. Scherp: Text Localization in Scientific Figures using Fully Convolutional Neural Networks on Limited Training Data, Symposium on Document Engineering (DocEng); Berlin, ACM, 2019. (Best Student Paper Award)
ACM SIGMM Rising Stars Symposium speaker on "About Multimedia Presentation Generation and Multimedia Metadata: From Synthesis to Analysis, and Back? " at ACM Multimedia, Amsterdam, 2016.
Best Paper Nomination for "A Comparison of Different Strategies for Automated Semantic Document Annotation" with G. Große-Bölting and C. Nishioka at Int. Conference on Knowledge Capture (K-CAP); Palisades, NY, USA, ACM, October 2015.
Best Paper Nomination for "Providing Alternative Declarative Descriptions for Entity Sets Using Parallel Concept Lattices" with T. Gottron and S. Scheglmann at European Semantic Web Conf. (ESWC); Anissaras, Crete, Greece, Springer, May, 2014.
Winner of the klickTel Award with mobEX, a method for incremental entity resolution over distributed social media data like locations, events, and persons and making it available to the users through a mobile app (together with C. Bikar, M. Jess, F. Knip, B. Opitz, B. Pfister, and T. Sztyler), 2013.
Winner of the Billion Triple Challenge of the Semantic Web Conference together with M. Konrath and T. Gottron on the SchemEX tool providing an efficient extraction and aggregation of implicit and explicit schema information from the Linked Open Data cloud at Semantic Web Conference in Bonn, Germany in 2011. The goal of the Billion Triple Challenge is to demonstrate "doing something useful with more than a billion triples".
The paper "Are Semantic Desktops Better?: Summative Evaluation Comparing a Semantic against a Conventional Desktop" with T. Franz and S. Staab has been nominated as best paper candidate at the International Conference on Knowledge Capturing (K-CAP); Redondo Beach, CA, USA, September, 2009.
Winner of the Billion Triple Challenge of the Semantic Web Conference together with S. Schenk, C. Saathoff, and S. Staab on the interactive application SemaPlorer for exploring a very large amount of distributed semantic social media data of different origin and quality in real-time at the Semantic Web Conference in Karlsruhe, Germany in 2008.
Best Paper Award for "Paving the Last Mile for Multi-Channel Multimedia Presentation Generation" with S. Boll, Multimedia Modeling (MMM); Melbourne, Australia, 2005.
Winner of the Audience Award for the project Virtual Laboratories together with Marco Schlattmann, A. Hasler, W. Heuten, and R. Kuczewski at the finals of the Medida-Prix for Media-Didactics in Higher Education, Basel, Switzerland, 2002.

Community Service

Ansgar is editor of the Journal of Web Semantics (JWS) since 2010. He is program committee member for conferences including World Wide Web (WWW), ACM Multimedia (MM), Multimedia Modeling (MMM), Extended Semantic Web Conference (ESWC), and International Semantic Web Conference (ISWC). He also reviews for journals including Proceedings of Very Large Data Base Endowment (PVLDB), IEEE Multimedia, Springer's Multimedia Systems and Multimedia Tools and Applications (MTAP), ACM Transactions on Multimedia Computing Communications and Applications (TOMCCAP), Journal of Web Semantics (JWS), and International Journal on Human Computer Studies (IJHCS). Ansgar is co-organizer of several scientific events such as the ACM Workshop series on Events in Multimedia conjunct with ACM Multimedia Beijing, China, 2009 and Firence, Italy, 2010, and Scottsdale, AZ, USA, in 2011. The workshop aims at bringing together different disciplines interested in detecting, processing, representing, and using events in multimedia and social media. Due to the workshop's success, the topic became its own area at the ACM Multimedia conference in 2012. Furthermore, Ansgar led the doctoral programme of INFORMATIK, the annual German computer science society meeting, in 2013, 2014, and 2015.

Entrepreneurship

Ansgar was co-founder and co-owner of the start-up company Kreuzverweis Solutions GmbH (2010-2014). The company develops a media management solution based on semantic technologies such as a web service for the semantic annotation of media assets.

He has also run the vIRTUAL tECHNOLOGIES GbR (1994-1998) together with Joachim Gelhaus (now with SPIELO International, Graz, Austria). We have developed higly-interactive multimedia applications and programming libraries. The products of our company are still online and can be visited from an archive of the original vIRTUAL tECHNOLOGIES GbR homepage (please note that the content is frozen since 1999). We have also developed a lot of software for the Amstrad CPC, a competitor of the Commodore C64. A small Amstrad CPC tribute page with information about my software developed in 1992/1993 also still exists.
He worked as freelancer with the (today one would call it Web 1.0) start up company Deutscher Online Verlag GmbH (D.O.V. GmbH) in Oldenburg, Germany form 1995 to 1997. The company has developed web-based platforms like the yellow pages http://gelbe.seiten.de/ and run web sites like http://www.de/. We have also developed one of the very first social networks in the Web, namely the first European trading platform for industrial garbage.

Keynote Talks

About Extreme Analyses of Texts and Graphs, Computer Science and Electronic Engineering Conference; Colchester, UK, September 2018.

About Multimedia Presentation Generation and Multimedia Metadata: From Synthesis to Analysis, and Back?, Second SIGMM Rising Stars Symposium; Amsterdam, Netherlands, October 2016.
Mining and Managing Large-Scale Linked Open Data, 28th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken); Nörten-Hardenberg, Germany.
About Knowledge Discovery, Ontologies and Life Sciences, 3rd Internal Meeting of the DFG Research Training Group 1743 „Genes, environment, inflammation“; Kiel, Germany, May, 2016.
Linked Open Data, Ministry of Economic Affairs, Employment, Transport and Technology of the state Schleswig-Holstein, Kiel, April 2016.
Publishing and Consuming Structured Data on the Web: Foundations and Selected Research Questions on Linked Open Data, 3th International Colloquium of Information Architecture and Multimodality; Brasilia, Brasilia, November 2014.
Events in Multimedia - Theory, Model, Application, Workshop on Event-based Media Integration and Processing; Barcelona, Spain, October 2013.
Semantic Modeling of Multimedia, Summer School on Multimedia Semantics; Koblenz, Deutschland, August 2009.
Digitally Authoring of Photo Books — A Success Story for Multimedia Annotation, Web of Data Practitioners Days; Vienna, Austria, October, 2008.

Invited Talks

Text mining for the sciences, Clinician Scientist Programme Retreat 2021, Wissenschaftszentrum Schloss Reisensburg, Günzburg, Germany, July, 2021.
Extreme Analyses of Texts and Graphs; Computer Science Atlas Talk der Universität Manchester, UK, Virtual Event, November, 2020.
Text for your Eyes: Text Analytics & Eyetracking, Café Scientifique; Colchester, UK, December, 2019.
About Extreme Analyses of Texts and Distributed Graphs, Signal AI Limited; London, UK, September, 2019.
Analyzing and Using Large-scale Web Graphs, School of Computing Science, University of Glasgow, UK, March, 2018.
Artificial Intelligence for Science (in German: Künstliche Intelligenz für die Wissenschaft), Künstliche Intelligenz (KI) - Perspektiven für Schleswig-Holstein, Fachhochschule Kiel, May 2018. Analyzing and Using Large-scale Web Graphs, School of Computing Science, University of Glasgow, Scotland, 2018.
About Knowledge Discovery, Text and Data Mining, and Ontologies, research colloquium of the Kiel Institute for the World Economy; Kiel, Germany, May, 2016.
Linked Data Mining, NII Shonan Meeting on Dimensionality and Scalability; Shonan, Japan, June/July, 2015.
Tagging-by-search: automatic image region labeling using gaze information obtained from image search, research colloquim at FXPAL; Palo Alto, CA, USA, February, 2015.
mobEx: An Approach for Incremental Entity Resolution of Distributed, Heterogeneous Social Media Sources, research colloquim at Yahoo! Research Labs; Sunnyvale, CA, USA, February, 2015.
Extraction and Analyses of Schema Information on the Linked Open Data Cloud, research colloquim at the University of California at Irvine, CA, USA, February, 2015.
Analyses of Schema Structures on the Linked Open Data Cloud, Spring meeting of the German Computer Science Group on Databases; Brunswick, Germany, March, 2014.
LOD in Digital Libraries - Current Issues, Spring meeting of the German Computer Science Group on Databases; Brunswick, Germany, March, 2014.
Scalable Extraction and Indexing of Schema Information in Linked Open Data, GESIS; Cologne, Germany, January, 2013.
Can you see it? Annotating Image Regions based on Users' Gaze Information, Technical University of Vienna; Vienna Austria, October, 2012.
SchemEX – Building an Index for Linked Open Data, University of Oslo; Oslo, Norway, August 2012.
STEVIE: Collaborative Mobile Points of Interests and Events and an Overview of the Institute for Web Science and Technologies, Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig; Leipzig, Germany, July, 2010.
Semantic Modeling of Multimedia, Summer School on Multimedia Semantics; Koblenz, Germany, August, 2009.
Linking the Semantics Ecosystem with Semantics Derivation Rules for Multimedia Content, Dagstuhl Seminar on Contextual and Social Media Understanding and Usage; Schloss Dagstuhl, Dagstuhl, Germany, June, 2008.
Authoring Semantically-rich Personalized Multimedia Presentations, TEWI Kolloquium, University of Klagenfurt; Klagenfurt, Austria, June, 2008.
Conducting Research in California with the Marie Curie Actions, TEWI Kolloquium, University of Klagenfurt; Klagenfurt, Austria, June, 2008.
The Creation of Personal Media Albums by Leveraging the Web 2.0 Community Approach, Institute for Business Application Systems of the Brandenburg University of Applied Sciences; Brandenburg an der Havel, Germany, April, 2008.
Towards an Ecosystem for Semantics, Institute for Informatics, University of Amsterdam; Amsterdam, Netherlands, January, 2008.
One Year in California with a Marie Curie Fellowship, Institute for Informatics, University of Amsterdam; Amsterdam Netherlands, January, 2008.
MM4U – A framework for developing personalized multimedia applications, Swinburne University of Technology, Faculty of Information & Communication Technologies; Melbourne, Australia, January, 2005.

Most Important Publications

T. Blume, D. Richerby, and A. Scherp: FLUID: A Common Model for Semantic Structural Graph Summaries Based on Equivalence Relations, Theoretical Computer Science, 2020.
T. Blume, D. Richerby, and A. Scherp: Incremental and Parallel Computation of Structural Graph Summaries for Evolving Graphs, International Conference on Information and Knowledge Management (CIKM), 2020.
F. Mai, L. Galke, and A. Scherp: CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model, International Conference on Learning Representations (ICLR); New Orleans, Louisiana, USA.
A. Scherp, T. Franz, C. Saathoff, and S. Staab: F – a model of events based on the foundational ontology DOLCE+DnS Ultralight, Int. Conf. on Knowledge Capture, pp. 137-144, ACM, 2009.
S. Schenk, C. Saathoff, S. Staab, and A. Scherp: SemaPlorer - Interactive semantic exploration of data and media based on a federated cloud infrastructure, J. Web Sem., 7(4):298-304, Elsevier, 200 (Winner of the Billion Triple Challenge 2008)
M. Konrath, T. Gottron, S. Staab, and A. Scherp: SchemEX - Efficient construction of a data catalogue by stream-based indexing of Linked Data, J. Web Sem., 16:52-58, Elsevier, 2012. (Winner of the Billion Triple Challenge 2011)
C. Saathoff and A. Scherp: Unlocking the semantics of multimedia presentations in the web with the multimedia metadata ontology, Int. World Wide Web Conf., pp. 831-840, ACM, 2010.
A. Scherp, C. Saathoff, T. Franz, and S. Staab: Designing core ontologies, Applied Ontology, 6(3):177-221, IOS, 2011.
A. Scherp: Canonical processes for creating personalized semantically rich multimedia presentations, Multimedia Systems, Springer, 14(6):415-425, 2008.
A. Scherp, T. Franz, C. Saathoff, and S. Staab: A core ontology on events for representing occurrences in the real world, Multimedia Tools Applications, 58(2):293-331, Springer, 2012.
S. Boll, P. Sandhaus, A. Scherp, and U. Westermann: Semantics, content, and structure of many for the creation of personal photo albums, Int. Conf. on Multimedia, pp. 641-650, ACM, 2007.
A. Scherp and S. Boll: MM4U - A Framework for Creating Personalized Multimedia Content, Managing Multimedia Semantics, pp. 246-287, IRM, 2005.
A. Scherp and S. Boll: Paving the Last Mile for Multi-Channel Multimedia Presentation Generation, Multimedia Modeling, pp. 190-197, Springer, 2005. (Best Paper Award)

For a complete list, please refer to the list here or to my DBLP page.

Supervised Phd Theses

Falk Böschen: Analysis and Modular Approach for Text Extraction from Scientific Figures on Limited Data, Universität zu Kiel, Germany, December 2021.
Herr Mohammad Abdel-Qader: Ontology Versioning Management in the Semantic Web, Universität zu Kiel, Germany, September 2020.
Falko Schönteich: Secure Distributed Information Management Based on Semantic Web Technologies, Universität zu Kiel, Germany, September 2021.
Chifumi Nishioka: Profiling Users and Knowledge Graphs on the Web, Kiel University, Germany, September 2017.
Johann Schaible: Reuse of Vocabularies for Modeling and Publishing Data as Linked Open Data, Kiel University, Germany, 2017.
Andreas Kasten: Secure Semantic Web Data Management in Open and Distributed Networks: Confidentiality, Integrity, and Compliant Availability in the Semantic Web, University of Koblenz-Landau, Koblenz, Germany, 2016. (together with Prof. Rüdiger Grimm)
Tina Walber: Exploiting Human Visual Attention for Automatic Image Selection and Annotation, University of Koblenz-Landau, Koblenz, Germany, 2014. (together with Prof. Steffen Staab)

Supervised Master Theses

(incomplete list)

Gregor Große-Bölting - Comparison of different Methods for the automated annotation of documents (in German), 2015. Prof. Dr. Werner Petersen-Preis der Technik

Barbara Göller - Strategies for a focused crawling of multimedia documents (in German), 2014.

Chantal Neuhaus - EyeSelect - An Approach for Gaze-Based Image Selection from Large Photo Collections, 2013.

Rolf Koch - Detecting spam activities on online review sites: State-of-the-art and implications for theory and practice, 2013.

Mark Schneider - Summative evaluation of faceted search and exploration of social media data on mobile devices (in German), 2011. Resuls of this thesis have been accepted for publication as full paper of MUM 2013. Mark has also received the Association of German Engineers (VDI) Middle Rhine advancement award in 2012. An extension of the thesis' results for the area of San Diego have also been nominated for the Data Journalism Award. (works with Gapgemini)
Mathias Konrath - SchemEX: Schema Extraction from Linked Open Data (in German), 2011. Is the winner of the Billion Triple Challenge 2011 with a price money of 1000 € and published in the Journal of Web Semantics. (works with Debeka, an insurance company)
Leon Kastler - EyeVisionSearch: Using an Eye-tracker to Improve the Use of Image Search Engines (in German), 2011. Published as full paper at HCII in Las Vegas, 2013. (now PhD student of the WeST institute)
Heiko Winnebeck - Design, implementation, and evaluation of a front-end for ImageAtlas (in German), 2011. Interdisciplinary collaboration with Dr. Markus Lohoff.
Carsten Schneider - Integration of ontologies and model-driven software-development: design and implementation of an Eclipse Ontologie Framework (in German), 2010.
Sven Tschirner - Semantic Access to INSPIRE: Distributed Search and Publication of GML Data in the Semantic Web at the Example of INSPIRE (in German), 2011. Published at Terra Cognita Workshop in 2011. (works with the German Federal Institute of Hydrology)
Daniel Eißing - Semantic integration of individual knowledge work and organizational knowledge work, 2010. Daniel has received the Young Talent Award 2011 of the Knowledge Management Working Group Karlsruhe in Germany with a price money of 500 €. Published as research paper at the International Semantic Web Conference in 2011 and the Multi-Conference on Information Systems Research (Multi-Konferenz Wirtschaftsinformatik) in 2012. (now with TomTec Imaging Systems, Munich)
Alexander Kleinen - Faceted exploration of Linked Data on mobile devices (in German), 2010. Article about this thesis is published in the Multimedia Tools and Applications (MTAP) journal of Springer. (works as Web Developer with Fourty-Four)
Daniel Schmeiß - Indexing and search of structured multimedia presentations (in German), 2010. (works as Software Developer with Capgemini)
Stefan Scheglmann - Model-driven engineering of ontology APIs, 2010. Published as in-use paper at Extended Semantic Web Conference in 2012. (now PhD student of the WeST institute)
Max Braun - Collaborative Creation and Exchange of Context-Sensitive, Semantic Points-of-Interests, 2009. Has recevied the AFCEA 2010 award for best diploma thesis with a price money of 3,500 €. Published as demo paper at Extended Semantic Web Conference in 2010. (works with Google)
Holger Cremer - Emergent Semantics in Personalized Multimedia Content (in German), 2006. Published in Journal of Digital Information Management in 2007. (works with the photo finisher CeWe Color)
Daniel Thobe - Smart Photo Annotation: Design and implementation of a context-driven metadata annotation system for digital images (in German), 2005. Published as research paper at Multimedia Modeling in 2007.
Matthias Nitsche - Design and implementation of an abstract user model for personalized systems (in German), 2004.

Last update: 10/13/2020.