留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于图模型决策的微博检索二次排序算法

杨震 张广源 范科峰

杨震, 张广源, 范科峰. 基于图模型决策的微博检索二次排序算法[J]. 机械工程学报, 2017, 43(1): 94-99. doi: 10.11936/bjutxb2015090041
引用本文: 杨震, 张广源, 范科峰. 基于图模型决策的微博检索二次排序算法[J]. 机械工程学报, 2017, 43(1): 94-99. doi: 10.11936/bjutxb2015090041
YANG Zhen, ZHANG Guangyuan, FAN Kefeng. Microblog Retrieval Results Re-ranking Using Graph Model Based Decision[J]. JOURNAL OF MECHANICAL ENGINEERING, 2017, 43(1): 94-99. doi: 10.11936/bjutxb2015090041
Citation: YANG Zhen, ZHANG Guangyuan, FAN Kefeng. Microblog Retrieval Results Re-ranking Using Graph Model Based Decision[J]. JOURNAL OF MECHANICAL ENGINEERING, 2017, 43(1): 94-99. doi: 10.11936/bjutxb2015090041

基于图模型决策的微博检索二次排序算法

doi: 10.11936/bjutxb2015090041
基金项目: 北京市优秀人才、北京市属高校青年拔尖人才资助项目(CIT&TCD201404052);国家科技支撑计划资助项目(2015BAK21B04);广西高校云计算与复杂系统重点实验室资助项目(15205)
详细信息
    通讯作者:

    范科峰(1978—), 男, 高级工程师, 主要从事信息安全方面的研究, E-mail:fankf@cesi.cn

  • 中图分类号: TP39

Microblog Retrieval Results Re-ranking Using Graph Model Based Decision

  • 摘要: 为了解决微博检索面临的“用户查询”和“相关文档”都是极端短文本的情况,及由此造成的检索性能欠佳的难题,研究并实现了一种微博检索结果的二次重排算法,基于微博内容相似关系构建关系图模型,利用PageRank算法对微博检索结果进行二次排序. 比较了基于余弦相似度、戴斯系数、单向戴斯系数等文本内容相似度计算方法. 实验结果表明:二次排序算法能够有效提升微博检索性能,同时图模型迭代性能与相关主题比例存在依存关系. 有鉴于此,讨论通过决策树重排算法去除非相关主题对微博排序的影响.

     

  • 图  微博间无向图模型

    Figure  1.  Undirected graph model

    图  微博间有向图模型

    Figure  2.  Directed graph model

    图  微博检索系统架构

    Figure  3.  Framework of microblog retrieval system

    表  1  Tweets的检索结果属性

    Table  1.   Attributes of tweets search results

    tweet id 与查询语句相似度/% 是否存在关注度 是否相关
    0001 60 1 Y
    0002 80 0 Y
    0003 65 0 N
    N 15 1 Y
    下载: 导出CSV

    表  2  2014 TREC microblog图模型聚类算法评测结果

    Table  2.   Performance of microblog retrieval based on graph model in TREC 2014

    Run id R-Prec Bpref P@10 P@20
    OSIM 0.2207 0.2673 0.4182 0.3682
    NSIM 0.2169 0.2655 0.3982 0.3536
    NCOS 0.2198 0.2667 0.3673 0.3255
    下载: 导出CSV

    表  3  TREC 2014 microblog图模型结合决策树评测结果

    Table  3.   Performance of microblog retrieval based on graph model and decision tree in TREC 2014

    Run id P@10 P@15 P@20
    OSIM 0.4532 0.4325 0.3962
    NSIM 0.4371 0.4251 0.3834
    NCOS 0.4363 0.4273 0.3875
    下载: 导出CSV
  • [1] LI X W.Research on the key technologies in Weibo retrieval [D]. Harbin: Harbin Institue of Technology, 2013. (in Chinese)
    [2] KWAK H, LEE C, PARK H, et al.What is twitter, a social network or a news media[C]//Proceedings of the 19th International Conference on World Wide Web. NY: ACM, 2010: 591-600.
    [3] WU S, MASON W A.Who says what to whom on twitter[C]//Proceedings of the 20th International Conference on World Wide Web. NY: ACM, 2011: 705-714.
    [4] JAVA A, FININ T.Why we twitter: understanding microblogging usage and communities[C]//Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis. NY: ACM, 2007: 56-65.
    [5] YANG J, COUNTS S.Comparing information diffusion structure in weblogs and microblogs[C]//Proceedings of the Fourth International AAAI Conference on Weblogs & Social Media. CA: AAAI, 2010: 351-354.
    [6] ROMERO D, MEEDER B, KLEINBERG J.Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter[C]//Proceedings of the 20th International Conference on World Wide Web. NY: ACM, 2011: 695-704.
    [7] QIN T, LIU T, XU J, et al.LETOR: a benchmark collection for research on learning to rank for information retrieval[J]. Information Retrieval, 2010, 13(4): 346-374.
    [8] LIU T.Learning to rank for information retrieval[J]. Foundations and Trends in Information Retrieval, 2009, 3(3): 225-331.
    [9] CAO Z, QIN T, LIU T, et al.Learning to rank: from pairwise approach to list-wise approach[C]//Proceedings of the 24th International Conference on Machine Learning. NY: ACM, 2007: 129-136.
    [10] HAN J, KAMBER M, PEI J.Data mining: concepts and techniques: concepts and techniques[M]. Netherlands: Elsevier, 2011.
    [11] PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the Web [R/OL]. [2015-03-01]. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.
    [12] HAVELIWALA T.Topic-sensitive pagerank: a context-sensitive ranking algorithm for Web search[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 784-796.
    [13] ERKAN G, RADER D.Lexrank: graph-based lexical centrality as salience in text summarization[J]. Journal of Artificial Intelligence Research, 2004, 22: 457-479.
    [14] SAFAVIN S, LANDERGEBE D.A survey of decision tree classifier methodology[J]. IEEE Trans on SMC, 1991, 21(3): 660-674.
    [15] QUNILAN J.C4. 5: programs for machine learning[M]. Netherlands: Elsevier, 2014.
  • 加载中
图(3) / 表(3)
计量
  • 文章访问数:  105
  • HTML全文浏览量:  69
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-09-15
  • 网络出版日期:  2022-09-09
  • 刊出日期:  2017-01-01

目录

    /

    返回文章
    返回