Heterogenous Graphs

10.1 Heterogeneous & Knowledge Graph Embedding

In this lecture from Stanford's CS224W course, the focus is on heterogeneous graphs and knowledge graph embeddings. Heterogeneous graphs, featuring multiple node and edge types, are explored, along with Relational Graph Convolutional Networks (RGCNs) designed for these complex structures. The lecture highlights the application of these graphs in fields like biomedicine and discusses the challenges in knowledge graph completion. Key aspects of RGCNs, such as handling multiple relation types and prediction tasks like node classification and link prediction, are covered. Evaluation methods for these models, including mean reciprocal rank and hits score, are also discussed.

Lecture 10.2 - Knowledge Graph Completion

The focus of this lecture is on the structure, application, and challenges of knowledge graphs. Knowledge graphs, as explained, are used to store and represent domain-specific information in a graph format, consisting of entities, their types, and interrelationships. These graphs find extensive applications in various industries, with companies like Google, Amazon, Facebook, IBM, and Microsoft utilizing them for enhancing search results, product recommendations, and question-answering systems. The lecture delves into the structure of these graphs, using examples like bibliographic networks and biomedical graphs, and highlights their role in question answering and conversational agents. A significant challenge discussed is the incompleteness of these graphs, often missing critical relationships. The lecture emphasizes the importance of knowledge graph completion, a task that involves predicting and filling in these missing links, as exemplified by the FreeBase knowledge graph, which, despite its vast size, lacks complete information on entities like birthplaces and nationalities.

Lecture 10.3 - Knowledge Graph Completion Algorithms

In this lecture, the focus is on the task of knowledge graph completion, which involves predicting missing information in extensive knowledge graphs. Key methodologies like TransE, TransR, DistMul, and ComplEx are introduced and discussed in detail. These methods primarily utilize node embeddings, particularly shallow embeddings, to predict the missing 'tail' of a graph given the 'head' node and the relation type. The lecture delves into the specifics of how these models approach embedding and defining closeness in the embedding space. TransE, for instance, employs a translation-based approach to align the head and relation vector with the tail, while TransR and DistMul offer unique strategies for handling relations, including relation-specific spaces and scoring functions. The lecture also evaluates the capability of these models to handle various relation types, such as symmetric, antisymmetric, and 1-to-N relations, providing insights into the effectiveness of different models in modeling diverse relation types within knowledge graphs.

Lecture 11.1 - Reasoning in Knowledge Graphs

In this lecture, the focus is on advanced techniques for reasoning within knowledge graphs using embeddings. Knowledge graphs are defined as collections of nodes (entities) and relations, each with varying types. The lecture delves into the concept of knowledge graph completion, which involves predicting missing relationships in these graphs. It further explores complex reasoning tasks, such as multi-hop and logical reasoning, and discusses two primary methods for handling queries: path queries and conjunctive queries, with a special emphasis on the Query2box method. A practical application is demonstrated using a biomedical knowledge graph, containing entities like drugs, diseases, side effects, and proteins, to illustrate how to answer complex queries in this context. The lecture also addresses the challenge of dealing with incomplete knowledge graphs, highlighting the need for methods that can implicitly account for missing information and handle noise in the data.

Lecture 11.2 - Answering Predictive Queries

In this lecture, the focus is on predicting answers to queries on knowledge graphs using the structure of the embedding space. The lecture extends the TransE method, a technique for embedding queries by minimizing the distance between the query's embedding and the target entity. This method is generalized for multi-hop reasoning, allowing for the chaining of relation vectors to form complex queries. The process involves starting with an anchor entity and adding relation vectors to reach a point in space that represents the query's embedding. The goal is to find entities close to this point, as they are considered the predicted answers. The lecture also addresses handling more complex queries that include logical operators like conjunctions, using knowledge graph traversal and embedding techniques to find entities satisfying multiple conditions. Additionally, it discusses the importance of training TransE for knowledge graph completion and the use of embeddings to implicitly impute missing relations, highlighting the implementation of logical operations in the embedding space for complex query answering.

Lecture 11.3 - Query2box: Reasoning over KGs

In this lecture, the focus is on Query2box, a novel method for reasoning within knowledge graphs using Box Embeddings. This technique addresses complex predictive queries by representing entities and relations as multidimensional boxes, characterized by a center and an offset. The intersection of these boxes is crucial, as it allows for the geometric interpretation of logical operations within the graph. The lecture also delves into the challenges of embedding AND-OR queries in low-dimensional spaces, proposing a solution through the transformation of queries into disjunctive normal form and handling unions at the final step. The training process for this method involves learning entity and relation embeddings, box transformations, and intersection operators. An application example demonstrates querying a knowledge graph to identify specific entities, highlighting Query2box's effectiveness in managing complex queries in incomplete knowledge graphs.