cypher - Erratic behavior of Neo4j while loading datas -
we're loading data in neo4j server represents (almost) k-ary trees k between 2 , 10 in case. have 50 node types possible, , same amount of type of relationships. server online , data can loaded several instances (so, unhappily, can't use neo4j-import)
we experience slow loading 100 000 nodes , relationships, take 6mn load in machine. experience loading of same datas takes 40mn ! looking @ neo4j process, sometime doing nothing.... in case, have messages :
warn [o.n.k.g.timeoutguard] transaction timeout. (overtime: 1481 ms).
beside don't experience problems query execute despite complex structures
we load data follow :
a cypher file loaded :
neo4j-shell -host localhost -v -port 1337 -file mygraph.cypher
the cypher file contains several sections :
constraints creations :
create constraint on (p:mynodetype) assert p.uid unique;
index on little set of nodes (10 @ more)
we select these avoid counter performance behaviours.
create index on :mynodetype1(uid);
nodes creations
using periodic commit 4000 load csv headers "file:////tmp/my.csv" csvline create (p:mynodetype1 {prop1: csvline.prop1, mysupuuid: toint(csvline.uidfonctionenglobante), linenum: toint(csvline.linenum), uid: toint(csvline.uid), name: csvline.name, projectid: csvline.projectid, vvalue: csvline.vvalue});
relationships creations
load csv headers "file:////tmp/relsinfixexpression-vleftoperand-simplename_javaouille-normal-b11695.csv" csvline match (n1:mynodetype1) n1.uid = toint(csvline.uidfather) n1, csvline match (n2:mynodetype2) n2.uid = toint(csvline.uidson) merge (n1)-[:voperandlink]-(n2);
question 1
we experienced, sometimes, oom in neo4j server while loading datas, difficult reproduce same datas. having added using periodic commit 1000
relationships loading commands, never reproduced problem. possibly solution oom problem ?
question 2
is periodic commit parameter ? there way speed data loading ? ie. strategy write data loading script ?
question 3
is there ways prevent timeout ? way write data loading script or maybe jvm tuning ?
question 4
some months ago splited cypher script in 2 or 3 parts launch concurrently, stoped because server messed data , became unusable. there way split "cleanly" script , launch them concurrently ?
question 1: yes, using periodic commit
first thing try when load csv
causes oom errors.
question 2&3: "sweet spot" periodic commit batch size depends on cypher query, data characteristics, , how neo4j server configured (all of can change on time). not want batch size high (to avoid occasional ooms), nor low (to avoid slowing down import). , should tune server's memory configuration well. have own experimentation discover best batch size , server configuration, , adjust them needed.
question 4: concurrent write operations touch same nodes and/or relationships must avoided, can cause errors (like deadlocks , constraint violations). if can split operations act on disjoint subgraphs, should able run concurrently without these kinds of errors.
also, should profile queries see how server actual execute them. example, if both :mynodetype1(uid)
, :mynodetype2(uid)
indexed (or have uniqueness constraints), not mean cypher planner automatically use indexes when executes last query. if profile of query shows not using indexes, can add hints query make planner (more to) use them:
load csv headers "file:////tmp/relsinfixexpression-vleftoperand-simplename_javaouille-normal-b11695.csv" csvline match (n1:mynodetype1) using index n1:mynodetype1(uid) n1.uid = toint(csvline.uidfather) match (n2:mynodetype2) using index n2:mynodetype2(uid) n2.uid = toint(csvline.uidson) merge (n1)-[:voperandlink]-(n2);
in addition, if ok store uid values strings, can remove uses of toint()
.this speed things extent.
wiki
Comments
Post a Comment