亲自测试一下load csv的性能,perf_counter方法得到的结果可信度更高,实验数据如下:
节点数 | 边数 | time.clock | time.perf_counter |
2878 | 1800 | 5毫秒 | 473毫秒 |
26945 | 18086 | 5毫秒 | 2445毫秒 |
151191 | 189396 | -- | 23419毫秒 |
ailx10
网络安全优秀回答者
网络安全硕士
去咨询
对这次实验的总结如下:
实验1(1k):1秒内预计能够处理 6084节点,3805条边
实验2(10k):1秒内预计能够处理 11020节点,7397条边
实验3(100k):1秒内预计能够处理 6455节点,8087条边
这次学习到了一个uuid用法,可以将字符串转换成一个不重复的整数,有点像hash函数的味道,在讲key转换成id的过程中,uuid起到了很大的作用~
from py2neo import Graphdef login: graph = Graph("http://localhost:7474", auth=("neo4j", "h3ll0")) # graph.delete_all # result =graph.run("""MATCH (n) DETACH DELETE n;""") # # 只执行一次 # result = graph.run("CREATE CONSTRAINT UniqueRequestIPNode ON (p:RequestIPNode) ASSERT p.id_src_ip IS UNIQUE;") # result = graph.run("CREATE CONSTRAINT UniqueDomainNode ON (o:DomainNode) ASSERT o.id_web_domain IS UNIQUE;") # result = graph.run("CREATE CONSTRAINT UniqueResponseNode ON (q:ResponseNode) ASSERT q.id_response_content IS UNIQUE;") # # 添加请求IP节点 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///requestIPNode_1.csv' AS row WITH toInteger(row.id_src_ip) AS id_src_ip,row.src_ip AS src_ip ,row.a1 AS a1, row.a2 AS a2 MERGE (p:RequestIPNode {id_src_ip: id_src_ip}) SET p.a1 = a1, p.a2=a2,p.src_ip=src_ip RETURN count(p); """) # # 添加域名节点 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///domainNode_1.csv' AS row WITH toInteger(row.id_web_domain) AS id_web_domain,row.web_domain AS web_domain ,row.b1 AS b1, row.b2 AS b2 MERGE (o:DomainNode {id_web_domain: id_web_domain}) SET o.b1 = b1, o.b2=b2,o.web_domain=web_domain RETURN count(o); """) # # 添加响应节点 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///responseNode_1.csv' AS row WITH toInteger(row.id_response_content) AS id_response_content,row.response_content AS response_content ,row.c1 AS c1, row.c2 AS c2 MERGE (q:ResponseNode {id_response_content: id_response_content}) SET q.c1 = c1, q.c2=c2,q.response_content=response_content RETURN count(q); """) # 添加请求IP--》域名的边 # # result = graph.run(":auto USING PERIODIC COMMIT 500") -- 无法执行 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///requestIPDomainEdge_1.csv' AS row WITH toInteger(row.id_src_ip) AS id_src_ip, toInteger(row.id_web_domain) AS id_web_domain, row.src_ip AS src_ip, row.web_domain AS web_domain,row.d1 AS d1,row.d2 AS d2 MATCH (p:RequestIPNode {id_src_ip: id_src_ip}) MATCH (o:DomainNode {id_web_domain: id_web_domain}) MERGE (p)-[rel:RequestIPDomainEdge {src_ip: src_ip,web_domain:web_domain,d1:d1,d2:d2}]->(o) RETURN count(rel); """) # # 添加域名--》响应IP的边 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///domainResponseEdge_1.csv' AS row WITH toInteger(row.id_web_domain) AS id_web_domain, toInteger(row.id_response_content) AS id_response_content, row.web_domain AS web_domain, row.response_content AS response_content,row.e1 AS e1,row.e2 AS e2 MATCH (o:DomainNode {id_web_domain: id_web_domain}) MATCH (q:ResponseNode {id_response_content: id_response_content}) MERGE (o)-[edg:DomainResponseEdge {web_domain:web_domain,response_content:response_content,e1:e1,e2:e2}]->(q) RETURN count(edg); """) print(result) if __name__ == "__main__": login
转载此文是出于传递更多信息目的。若来源标注错误或侵犯了您的合法权益,请与本站联系,我们将及时更正、删除、谢谢。
https://www.414w.com/read/204008.html