1、数据源:包括文本、pdf、数据库等不同来源
2、使用到的库:jieba gensim sklearn keras
3、可以实现的服务:找出相关和相近词(以分词为准)、比较2个分词的相似度、和哪些相关同时和别的不相关(语义上的模糊查找)
比如:中国银行:
[["中国工商银行", 0.7910350561141968], ["601988", 0.7748256921768188], ["工商银行", 0.7616539001464844], ["建设银行", 0.7573339939117432], ["中国建设银行", 0.7504717707633972], ["中行", 0.7469172477722168], ["中国农业银行", 0.7167254686355591], ["交通银行", 0.7115263938903809], ["农业银行", 0.7070150375366211], ["中信银行", 0.6993384957313538], ["建行", 0.6886808276176453], ["工行", 0.684762716293335], ["招商银行", 0.6723880767822266], ["中国民生银行", 0.6720935106277466], ["兴业银行", 0.6705615520477295], ["03988", 0.6682215332984924], ["浦发银行", 0.6620436310768127], ["光大银行", 0.6612452268600464], ["交行", 0.6425610780715942], ["601939", 0.6396690607070923], ["601398", 0.6362080574035645], ["汇丰银行", 0.6354925036430359], ["中国光大银行", 0.6283385157585144], ["华夏银行", 0.6261048316955566], ["090601", 0.6191191077232361], ["农行", 0.6165546774864197], ["南京银行", 0.6162608861923218], ["谷裕", 0.6026109457015991], ["民生银行", 0.6018795371055603], ["B02776", 0.6003248691558838], ["北京银行", 0.5989225506782532], ["00939", 0.5841124057769775], ["601288", 0.5798826217651367], ["法国兴业银行", 0.5750421285629272], ["600036", 0.5725768804550171], ["中银香港", 0.5725655555725098], ["渣打银行", 0.5723541975021362], ["上海银行", 0.5716006755828857], ["中资银行", 0.5714462399482727], ["史晨昱", 0.5713250637054443], ["01398", 0.5696423053741455], ["01288", 0.5673946738243103], ["国家开发银行", 0.5673025846481323], ["该行", 0.5642573237419128], ["部万钊", 0.5616151094436646], ["601998", 0.5594305992126465], ["601328", 0.5585275292396545], ["中信实业银行", 0.5555926561355591], ["花旗银行", 0.5535871386528015], ["宁波银行", 0.5529069900512695]]
中国:
[["世界", 0.7685298919677734], ["全球", 0.7626694440841675], ["世界范围内", 0.7018718123435974], ["我国", 0.6887967586517334], ["全世界", 0.681572437286377], ["美国", 0.6747004985809326], ["亚洲", 0.6721218824386597], ["中国政府", 0.6407063007354736], ["国内", 0.6364794969558716], ["印度", 0.6236740946769714], ["国际", 0.6172101497650146], ["大国", 0.6167921423912048], ["亚洲各国", 0.6133526563644409], ["亚太地区", 0.610878586769104], ["全球范围", 0.6104856729507446], ["在世界上", 0.6089214086532593], ["东亚地区", 0.6027672290802002], ["日本", 0.601786196231842], ["当今世界", 0.6002479791641235], ["亚洲地区", 0.5914613604545593], ["全球性", 0.5876830220222473], ["全球化", 0.5855609178543091], ["非洲大陆", 0.5852369070053101], ["世界市场", 0.5849867463111877], ["欧洲", 0.5787924528121948], ["第三世界", 0.5771710872650146], ["全球一体化", 0.5766278505325317], ["西方", 0.5766173601150513], ["欧美国家", 0.5756310224533081], ["拉美", 0.5752301216125488], ["经济大国", 0.5745469331741333], ["第一世界", 0.5730843544006348], ["东亚国家", 0.5727769136428833], ["强国", 0.5700076222419739], ["工业界", 0.5689312219619751], ["韩国", 0.5672852396965027], ["各国", 0.5603423118591309], ["新兴国家", 0.5577350854873657], ["发达国家", 0.5569929480552673], ["英国", 0.5562434196472168], ["德国", 0.5535132884979248], ["当今", 0.5534329414367676], ["拉美地区", 0.5512816309928894], ["东亚各国", 0.5505844354629517], ["中国崛起", 0.5435972213745117], ["拉美国家", 0.5431581735610962], ["西半球", 0.5429360866546631], ["西方国家", 0.5408912897109985], ["本国", 0.5392733216285706], ["俄罗斯", 0.5382996797561646]]
万科:
[["金地", 0.8261025547981262], ["九龙仓", 0.8132781386375427], ["绿城", 0.7946393489837646], ["恒大", 0.7812688946723938], ["碧桂园", 0.7795591354370117], ["郁亮", 0.7790281772613525], ["远洋地产", 0.7744697332382202], ["融创", 0.7735781669616699], ["恒大地产", 0.7618383169174194], ["融创中国", 0.753994345664978], ["招商地产", 0.7349810600280762], ["合生创展", 0.7338892221450806], ["华润置地", 0.7292978167533875], ["龙湖", 0.7278294563293457], ["旭辉", 0.7256796956062317], ["龙湖地产", 0.7223220467567444], ["王石", 0.7217631936073303], ["宝能", 0.7196142673492432], ["孙宏斌", 0.7192676067352295], ["绿城中国", 0.7135359048843384], ["越秀地产", 0.7109189629554749], ["保利地产", 0.7031007409095764], ["世茂", 0.7004261016845703], ["中国金茂", 0.6861996650695801], ["合景泰富", 0.6830298900604248], ["雅居乐", 0.6811322569847107], ["世茂房地产", 0.6798348426818848], ["华远地产", 0.6793832778930664], ["万科A", 0.677139937877655], ["绿地", 0.6746823787689209], ["富力", 0.6702776551246643], ["宝龙地产", 0.662824809551239], ["富力地产", 0.660904049873352], ["宝能系", 0.6577337384223938], ["金科", 0.6565895676612854], ["阳光城", 0.6557801961898804], ["方兴", 0.654536247253418], ["协信", 0.6533593535423279], ["金地集团", 0.6524677276611328], ["龙光地产", 0.644176721572876], ["九龙仓集团", 0.6433624029159546], ["中国恒大", 0.6420278549194336], ["华侨城", 0.6391571760177612], ["许家印", 0.6391341686248779], ["万通地产", 0.6383571028709412], ["华远", 0.6379672288894653], ["宋卫平", 0.6350336670875549], ["龙头房企", 0.6337549090385437], ["东原", 0.6333705186843872], ["新鸿基地产", 0.6329449415206909]]
4、基本步骤:
数据源的load->gensim->classifier(传统基于词频的/深度学习的 keras)
5、model结果的使用 gensim.models.keyedvectors.KeyedVectors
wmdistance(document1, document2) # 输入是2个doc的单词集合
相关推荐
HTML::Pipeline - HTML处理过滤器和工具类
pipeline engineering
Jenkins高级篇之Pipeline技巧篇-3-JSON文件处理多个参数进一步优化.rar
zeromq的parallel-pipeline并行处理模式的jave实现,Eclipse下的maven工程,相关引用已在pom文件引入,可以直接运行。
赛诺菲 Pipeline.pdf
强生 Pipeline.pdf
构建机器学习Pipeline,Architecting a Machine Learning Pipeline 。
高通QCOM camera Pipeline可视化工具 1.4版本
方便gltf-pipeline相关人员下载使用
pipeline ADC的设计指南
NULL 博文链接:https://yjhexy.iteye.com/blog/670309
digital high speed pipeline ADC
This book describes the Direct3D graphics pipeline, from presentation of scene data to pixels appearing on the screen. The book is organized sequentially following the data °ow through the pipeline ...
默克 Pipeline.pdf
Hosted Community Edition - Try It Now! ...Email: help@pipeline.ai Web: https://support.pipeline.ai YouTube: https://youtube.pipeline.ai Slideshare: https://slideshare.pipeline.ai Work
ASP.NET MVC Pipeline
pipeline.pdf
艾伯维 Pipeline.pdf
阿斯利康 Pipeline.pdf
武田 Pipeline 2.pdf