They would like to generate Heterogeneous Graph embedding consisting of graph structure information and node content information.
Few of them can jointly consider heterogeneous structural (graph) information as well as heterogeneous contents information of each node effectively.
- many nodes could not connect to all types of neighbors
- A node could carry unstructured content
- Different types of neighbors contributes differently to node embedding
They propose a heterogeneous graph neural network model to resolve this issue.
Specifically, they first introduce a random walk with restart strategy to sample a fixed size of strongly correlated heterogeneous neighbors for each node and group them based upon node types.
Next, we design a neural network architecture with two modules to aggregate feature information of those sampled neighboring nodes. The first module encodes “deep” feature interactions of heterogeneous contents and generates content embedding for each node.
The second module aggregates content (attribute) embeddings of different neighboring groups (types) and further combines them by considering the impacts of different groups to obtain the ultimate node embedding.
Finally, we leverage a graph context loss and a mini-batch gradient descent procedure to train the model in an end-to-end manner.
Sampling Heterogeneous Neighbors (C1)
Most of other GNNs models have some issues:
- They cannot capture feature information from different types of neighbors.
- Besides that, They are weakened by various neighbor sizes.
- They are not suitable for aggregating heterogeneous neighbors which have different content features.
They propose a heterogeneous neighbors sampling strategy based on random walk with restart (RWR).
- RWR collects all types of neighbors for each node
- the sampled neighbor size of each node is fixed and the most frequently visited neighbors are selected;
- neighbors of the same type (having the same content features) are grouped such that type-based aggregation can be deployed.
Encoding Heterogeneous Contents (C2)
Given a node, it has different type neighbors.
Input: 1 neighboring node
output: 1 embedding
For each neighboring node, we would like to get its encoding but different type of node has different contents. How do we aggregate the information together? Bi-LSTM!
We can apply pre-trained model to get embedding of each information and feed into Bi-LSTM (to capture deep interactions) and then Mean Pooling to aggregate it.
Note that the Bi-LSTM operates on an unordered content set.
(1) it has concise structures with relative low complexity (less parameters), making the model implementation and tuning relatively easy
(2) it is capable to fuse the heterogeneous contents information, leading to a strong expression capability;
(3) it is flexible to add extra content features, making the model extension convenient.
Aggregating Heterogeneous Neighbors (C3)
As C1, given 1 node, we have neighboring nodes. Thanks to C2, each neighboring node can encode to 1 embedding.
But how do we aggregate these neighboring nodes together?
Group by Type and for each type run Bi-LSTM!
Same Type Neighbors Aggregation
After grouping neighboring nodes, in each group, we have several nodes in same type. So we can run Same Type Neighbors Aggregation.
We employ Bi-LSTM to aggregate content embeddings of all t-type neighbors and use the average over all hidden states to represent the general aggregated embedding.
We use different Bi-LSTMs to distinguish different node types for neighbors aggregation. Note that the Bi-LSTM operates on an unordered neighbors set, which is inspired by GraphSAGE.
Again, given a node, we have neighboring nodes and we group them into several group by types. So each type, we have 1 embedding.
How to aggregate these embedding? Attention layer
To combine these type-based neighbor embeddings with v’s content embedding, we employ the attention mechanism.
The motivation is that different types of neighbors will make different contributions to the final representation of v.
Objective and Model Training
A graph context loss and a mini-batch gradient descent.
They applied negative sampling (1 by 1) and similar toDeepWalk, they apply random walk to get negative samples.
Finally, we have this whole picture.
In this paper, they also discuss several experiments.
- (RQ1) How does HetGNN perform vs. state-of-the-art baselines for various graph mining tasks, such as link prediction (RQ1–1), personalized recommendation (RQ1–2), and node classification & clustering (RQ1–3)?
- (RQ2) How does HetGNN perform vs. state-of-the-art baselines for inductive graph mining tasks, such as inductive node classification & clustering?
- (RQ3) How do different components, e.д., node heterogeneous contents encoder or heterogeneous neighbors aggregator, affect the model performance?
- (RQ4) How do various hyper-parameters, e.д., embedding dimension or the size of sampled heterogeneous neighbors set, impact the model performance?
KDD 19′: Applying Deep Learning To Airbnb Search
KDD 17′: Visual Search at eBay
KDD 18′: Real-time Personalization using Embeddings for Search Ranking at Airbnb
KDD 18′: Notification Volume Control and Optimization System at Pinterest
KDD 19′: PinText: A Multitask Text Embedding System in Pinterest
CVPR19′ Complete the Look: Scene-based Complementary Product Recommendation
COLING’14: Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts
NAACL’19: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
NIPS’2017: Attention Is All You Need (Transformer)
KDD’19: Learning a Unified Embedding for Visual Search at Pinterest
BMVC19′ Classification is a Strong Baseline for Deep Metric Learning
KDD’18: Graph Convolutional Neural Networks for Web-Scale Recommender Systems
WWW’17: Visual Discovery at Pinterest
ICCV: International Conference on Computer Vision
CVPR: Conference on Computer Vision and Pattern Recognition
Top Conference Paper Challenge: