Project: CRII: III: Novel Embedding Algorithms for Large-Scale and Complex
Attributed Networks
PI: Xia "Ben" Hu, Texas A&M University

Project Goals
Attributed networks are ubiquitous in a variety of real-world systems such as social media, academic networks, health care systems, and enterprise systems. Attributed networks differ from traditional networks where only nodes and links are represented. The nodes in these networks are also associated with a rich set of attributes. For example, in academic networks, researchers collaborate with each other and are distinct from others by their unique research interests or profiles; in social networks, users interact and communicate with others and also post some personalized content. As an effective computational tool in analyzing networks, network embedding is a technique for learning a low-dimensional representation for each node in the network. Such a representation plays an essential role in supporting a variety of network analysis applications, including community detection, link prediction, and network visualization. While most existing studies focused on simple network embedding, the aim of this project is to develop novel embedding algorithms for attributed networks by tackling challenges brought by large-scale and complex attributed network data. The results of this project will be a new class of theoretical as well as practical network embedding methods to analyze large and complex network data. The developed algorithms will be flexible to be adapted for facilitating various industrial applications in Social Computing, Health Informatics, and Enterprise Systems.The primary research goal of this project is to develop efficient and effective data-driven network embedding algorithms to deal with large-scale attributed networks that contain complex network interactions. This research will address the problem of attributed network analytics from two perspectives, i.e., scalable network embedding and leveraging network interactions. Specifically, this project aims to achieve the goal through two primary objectives: (1) performing efficient embedding on large-scale attributed networks via advancing joint learning models towards being in line with fast optimization algorithms; and (2) transforming existing network embedding algorithms by analyzing network interactions. The primary education goal of this project is to develop a new curriculum that incorporates the proposed research, and also allow the PI to continue the ongoing efforts of providing research opportunities to undergraduate students, female, and underrepresented students.
Students
- Xiao Huang, Ph.D.
- Ninghao Liu, Ph.D. Student
- Qingquan Song, Ph.D. Student
- Mengnan Du, Ph.D. Student
Publications
Conferences:- Yi-Wei Chen, Qingquan Song, Xi Liu, P S Sastry, and Xia Hu, On Robustness of Neural Architecture Search under Label Noise, in Frontiers in Big Data, 2020.
- Fan Yang, Ninghao Liu, Mengnan Du, Kaixiong Zhou, Shuiwang Ji, and Xia Hu, Deep Neural Networks with Knowledge Instillation, in SIAM International Conference on Data Mining, 2020.
- Mengnan Du, Ninghao Liu, and Xia Hu, Techniques for Interpretable Machine Learning, Communications of the ACM, pages 68-77, 2020.
- Mengnan Du, Ninghao Liu, Fan Yang, and Xia Hu, Learning Credible Deep Neural Networks with Rationale Regularization, in IEEE International Conference on Data Mining, 2019.
- Qingquan Song, Shiyu Chang, and Xia Hu, Coupled Variational Recurrent Collaborative Filtering, in SIGKDD Conference on Knowledge Discovery and Data Mining, pages 335-343, 2019.
- Ninghao Liu, Qiaoyu Tan, Yuening Li, Hongxia Yang, Jingren Zhou, and Xia Hu, Is a Single Vector Enough? Exploring Node Polysemy for Network Embedding, in SIGKDD Conference on Knowledge Discovery and Data Mining, pages 932-940, 2019.
- Xiao Huang, Qingquan Song, Yuening Li, and Xia Hu, Graph Recurrent Networks with Attributed Random Walks, in ACM International Conference on Web Search and Data Mining, pages 732-740, 2019.
- Mengnan Du, Ninghao Liu, Fan Yang, Shuiwang Ji, and Xia Hu, On Attribution of Recurrent Neural Network Predictions via Additive Decomposition, in The World Wide Web Conference, pages 383-393, 2019.
- Ninghao Liu, Mengnan Du, and Xia Hu, Representation Interpretation with Spatial Encoding and Multimodal Analytics, in ACM International Conference on Web Search and Data Mining, pages 60-68, 2019.
- Qingquan Song, Haifeng Jin, Xiao Huang, and Xia Hu, Multi-Label Adversarial Perturbations, in IEEE International Conference on Data Mining, 2018.
- Ninghao Liu, Xiao Huang, Jundong Li, and Xia Hu, On Interpretation of Network Embedding via Taxonomy Induction, in SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1812-1820, 2018.
- Mengnan Du, Ninghao Liu, Qingquan Song,and Xia Hu, Towards Explanation of DNN-based Prediction with Guided Feature Inversion, in SIGKDD Conference on Knowledge Discovery and Data Mining, 2018.
- Ninghao Liu, Donghwa Shin, and Xia Hu, Contextual Outlier Interpretation, in International Joint Conference on Artificial Intelligence, 2018.
- Jinxue Zhang, Jingchao Sun, Rui Zhang, Yanchao Zhang, and Xia Hu, Privacy-Preserving Social Media Data Outsourcing, in IEEE Conference on Computer Communications, 2018.
- Zepeng Huo, Xiao Huang, and Xia Hu, Link Prediction with Personalized Social Influence, in AAAI Conference on Artificial Intelligence, 2018.
- Xiao Huang, Qingquan Song, Jundong Li, and Xia Hu, Exploring Expert Cognition for Attributed Network Embedding, in ACM International Conference on Web Search and Data Mining, pages 270-278, 2018.
- Xiao Huang, Jundong Li, and Xia Hu, Accelerated Attributed Network Embedding, in SIAM International Conference on Data Mining, pages 633-641, 2017.
- Qingquan Song, Xiao Huang, Hancheng Ge, James Caverlee, and Xia Hu, Multi-Aspect Streaming Tensor Completion, in SIGKDD Conference on Knowledge Discovery and Data Mining, pages 435-443, 2017.
- Ninghao Liu, Xiao Huang, and Xia Hu, Accelerated Local Anomaly Detection via Resolving Attributed Networks, in International Joint Conference on Artificial Intelligence, pages 2337-2343, 2017.
- Cheng Cao, Hancheng Ge, Haokai Lu, Xia Hu, and James Caverlee, What Are You Known For? Learning User Topical Profiles with Implicit and Explicit Footprints, in International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.
- Qingquan Song, Hancheng Ge, James Caverlee, and Xia Hu, Tensors Completion Algorithms in Big Data Analytics, ACM Transactions on Knowledge Discovery from Data, 2019.
- Xiao Huang, Jundong Li, and Xia Hu,A General Embedding Framework for Heterogeneous Information Learning in Large-scale Networks, ACM Transactions on Knowledge Discovery from Data, 2018.
- Mohammad Akbari, Xia Hu, Fei Wang, and Tat-Seng Chua, Wellness Representation of Users in Social Media: Towards Joint Modeling of Heterogeneity and Temporality, IEEE Transactions on Knowledge and Data Engineering, 2017.
- Jun Gao, Ninghao Liu, Mark Lawley, and Xia Hu, An Interpretable Classification Framework for Information Extraction from Online Healthcare Forums, Journal of Healthcare Engineering, 2017.
- Xiao Huang, "Learning From Attributed Networks - Embedding, Theory, and Interactions", Ph.D. Thesis, 2020.
- Zepeng Huo, "Link Prediction with Personalized Social Influence", Master Thesis, 2017.
- Jun Gao, "An Interpretable Classification Framework for Information Extraction from Online Healthcare Forums", Master Thesis, 2017.
Current Results
Specific Objectives:The key objective of this project is to systematically develop embedding frameworks for the large-scale and complex attributed networks within various scenarios. Our team mainly focus on specific research objectives as follows.
- Accelerating the attributed network embedding by decomposing the complex process into many sub-problems and performing optimization in a distributed manner.
- Accelerating the attributed network embedding via parallel mini-batch stochastic gradient descent (SGD) with application to the anomaly detection.
- Advancing the attributed network embedding by actively querying domain experts a number of trials to model their abstract cognition as concrete answers and incorporating them into the embedding.
- Advancing the attributed network embedding by taking advantage of the link directionality with application to the link prediction.
- Development of fast and streaming embedding frameworks for tensors with application to missing data completion in complex systems.
- Development of an interpretable classification framework for information extraction from online healthcare forums.
- Providing a modern overview of recent advances in tensor completion algorithms from the perspective of big data analytics characterized by diverse variety, large volume, and high velocity.
- Development of an interpretable framework for attributed network embedding via taxonomy induction to understand how nodes are distributed in embedding space, as well as exploring the factors that lead to the embedding results.
- Development of a novel and general framework to interpret the embedding representations learned by different algorithms.
- Development of a novel framework for differentially privacy-preserving social media data outsourcing.
- Generation of multi-label adversarial perturbation to analyze the vulnerability and robustness of multi-label learning models.
- Development of a post-hoc framework to enhance the interpretability of recurrent neural networks by providing interpretable rationales for their predictions.
- Advancing a tailored graph neural network by incorporating joint random walks on attributed networks into it.
- Advancing network embedding by learning multiple low-dimensional vectors for each node, to represent its multiple roles.
- Advancing collaborative filtering algorithms to handle the data dynamicity and complexity in streaming recommender systems.
- Development of an explainable framework, which encourages deep neural networks to focus more on evidences that actually matter for the task at hand, and to avoid overfitting to data-dependent bias and artifacts.
- Advancing the neural architecture search by mitigating the impact of noisy labels.
-
Xiao Huang, Qingquan Song, Jundong Li, and Xia Hu, Exploring Expert Cognition for Attributed Network Embedding, WSDM, pages 270–278, 2018. (Slides)
Code in MATLAB.
Reference in BibTeX:
@conference{Huang-etal18Exploring,
Title = {Exploring Expert Cognition for Attributed Network Embedding},
Author = {Xiao Huang and Qingquan Song and Jundong Li and Xia Hu},
Booktitle = {ACM International Conference on Web Search and Data Mining},
Pages = {270--278},
Year = {2018}} - Xiao Huang, Jundong Li, and Xia Hu, Accelerated Attributed Network Embedding, SDM, pages 633–641, 2017. (Slides)
Code in MATLAB & Python, both for attributed network embedding and pure network embedding.
Reference in BibTeX:
@conference{Huang-etal17Accelerated,
Author = {Xiao Huang and Jundong Li and Xia Hu},
Booktitle = {SIAM International Conference on Data Mining},
Pages = {633--641},
Title = {Accelerated Attributed Network Embedding},
Year = {2017}} - Xiao Huang, Jundong Li, and Xia Hu, Label Informed Attributed Network Embedding, WSDM, pages 731–739, 2017. (Slides)
Code in MATLAB, both for supervised and unsupervised attributed network embedding.
Reference in BibTeX:
@conference{Huang-etal17Label,
Author = {Xiao Huang and Jundong Li and Xia Hu},
Booktitle = {ACM International Conference on Web Search and Data Mining},
Pages = {731--739},
Title = {Label Informed Attributed Network Embedding},
Year = {2017}}
BlogCatalog is undirected network with 5,196 nodes and 171,743 edges, associated with node attributes of dimension 8,189.
Flickr is undirected network with 7,575 nodes and 239,738 edges, associated with node attributes of dimension 12,047.
Yelp_multilabel is an undirected network with 249,012 nodes and 1,779,803 edges, associated with node attributes of dimension 20,000.
Project Impacts
- The achievements of this project have significantly contributed to development of attributed network embedding and interpretable representation learning. We have proposed a series of novel embedding algorithms for attributed networks to tackle challenges brought by large-scale and complex attributed network data.
- During this project, we focused on two perspectives in this principal discipline: (1) exploring potential efficient joint embedding algorithms that could be applied to large-scale attributed networks; and (2) transforming existing network embedding algorithms by analyzing network interactions, and by leveraging social theories, e.g., social status analysis and social identity theory.
- The embedding frameworks for attributed networks developed in this project also make significant contributions to the related fields of machine learning and network analytics. The learned representations play an essential role in supporting a variety of network analysis applications including community detection, link prediction, and network visualization. These proposed frameworks could also be flexibly adapted for facilitating various industrial applications in Social Computing, Health Informatics, and Enterprise Systems.
- Representation learning is not only a crucial tool in attributed network analysis, but also widely used in many other disciplines such as recommendation and natural language processing. Most of the representation learning results could not be directly understood by end users. Our proposed interpretable frameworks are quite general and could be used to understand different representation learning results.
- The project allows the PI to continue the ongoing efforts of providing research opportunities to undergraduate students, female and underrepresented students. By incorporating the key outcomes of this project, the PI created new course modules in CSCE 470 & CSCE 670, which advanced college students' knowledge and inspired K-12 students in STEM field.
Acknowledgement
This material is based upon work supported by the National Science Foundation under Grant No. IIS-1657196.Disclaimer
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.Last Updated: May 2020