![]() ![]() To understand these affectual states is an essential step towards understanding natural language. People convey sentiments and emotions through language. Proceedings of the Twelfth Language Resources and Evaluation Conference ![]() 2020Īffect in Tweets: A Transfer Learning Approach Thorough experiments on two real-world datasets with three tasks show that NewsBERT can empower various intelligent news applications with much smaller models. In addition, we propose a momentum distillation method by incorporating the gradients of teacher model into the update of student model to better transfer the knowledge learned by the teacher model. ![]() In our approach, we design a teacher-student joint learning and distillation framework to collaboratively learn both teacher and student models, where the student model can learn from the learning experience of the teacher model. In this paper, we propose NewsBERT, which can distill PLMs for efficient and effective news intelligence. However, existing language models are pre-trained and distilled on general corpus like Wikipedia, which has gaps with the news domain and may be suboptimal for news intelligence. Knowledge distillation techniques can compress a large PLM into a much smaller one and meanwhile keeps good performance. Many online news applications need to serve millions of users with low latency tolerance, which poses great challenges to incorporating PLMs in these scenarios. However, most existing PLMs are in huge size with hundreds of millions of parameters. News articles usually contain rich textual information, and PLMs have the potentials to enhance news text modeling for various intelligent news applications like news recommendation and retrieval. Pre-trained language models (PLMs) like BERT have made great progress in NLP. News BERT: Distilling Pre-trained Language Model for Intelligent News Applicationįindings of the Association for Computational Linguistics: EMNLP 2021 Extensive experiments on two real-world datasets validate our method can effectively improve the performance of user modeling for personalized news recommendation. Moreover, we propose a hierarchical user interest matching framework to match candidate news with different levels of user interest for more accurate user interest targeting. We use a three-level hierarchy to represent 1) overall user interest 2) user interest in coarse-grained topics like sports and 3) user interest in fine-grained topics like football. Instead of a single user embedding, in our method each user is represented in a hierarchical interest tree to better capture their diverse and multi-grained interest in news. In this paper, we propose a news recommendation method with hierarchical user interest modeling, named HieRec. However, user interest is usually diverse and multi-grained, which is difficult to be accurately modeled by a single user embedding. Existing news recommendation methods usually learn a single user embedding for each user from their previous behaviors to represent their overall interest. User interest modeling is critical for personalized news recommendation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) Hie Rec: Hierarchical User Interest Modeling for Personalized News Recommendation Extensive experiments on two real-world datasets validate the effectiveness and efficiency of our method. Multiple teacher models originated from different time steps of our post-training procedure are used to transfer comprehensive knowledge to the student model in both its post-training stage and finetuning stage. We further propose a two-stage knowledge distillation method to improve the efficiency of the large PLM-based news recommendation model while maintaining its performance. We first design a self-supervised domain-specific post-training method to better adapt the general PLM to the news domain with a contrastive matching task between news titles and news bodies. In this paper, we propose Tiny-NewsRec, which can improve both the effectiveness and the efficiency of PLM-based news recommendation. Moreover, PLMs usually contain a large volume of parameters and have high computational overhead, which imposes a great burden on low-latency online services. However, most existing works simply finetune the PLM with the news recommendation task, which may suffer from the known domain shift problem between the pre-training corpus and downstream news texts. Recently, pre-trained language models (PLMs) have demonstrated the great capability of natural language understanding and benefited news recommendation via improving news modeling. News recommendation is a widely adopted technique to provide personalized news feeds for the user. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |