Aggregating networks into a crowd network¶
Introduction¶
Given a number of networks in SeidrFile
format, seidr can aggregate those
into a crowd network. The basic syntax is:
seidr aggregate <SeidrFile> <SeidrFile> ...
There are currently four methods of aggregation implemented:
-m borda
: This will ouput a mean of ranks.-m top1
: This will ouput the edge with the highest score (==lowest rank) of all methods-m top2
: This will ouput the middle of the two highest scores (==lowest ranks) of all methods-m irp
: This will calculate the inverse rank product.
From a real example:
seidr aggregate -m irp ../elnet/elnet_scores.sf ../narromi/narromi_scores.sf ../pearson/pearson_scores.sf ../spearman/spearman_scores.sf ../plsnet/plsnet_scores.sf ../aracne/aracne_scores.sf ../tigress/tigress_scores.sf ../clr/clr_scores.sf ../genenet/genenet_scores.sf ../svm/svm_scores.sf ../llr/llr_scores.sf ../genie3/genie3_scores.sf ../anova/anova_scores.sf
Without specifying an output file, this will create a file aggregated.sf
in the
current working directory. Each column after the third (excluding the supplementary)
column stores the score and rank for each edge (if present) in all aggregated methods.
Converted to text (with seidr view
) the file looks like this:
Source Target Type ELNET_score;ELNET_rank Narromi_score;Narromi_rank Pearson_score;Pearson_rank Spearman_score;Spearman_rank PLSNET_score;PLSNET_rank ARACNE_score;ARACNE_rank TIGRESS_score;TIGRESS_rank CLR_score;CLR_rank PCor_score;PCor_rank SVM_score;SVM_rank LLR_score;LLR_rank GENIE3_score;GENIE3_rank ANOVA_score;ANOVA_rank irp_score;irp_rank
G2 G1 Undirected 0.004;334084 0.0128741;202752 -0.159435;202751 -0.00225177;1.32058e+06 1.07712e-05;360264nan;nan nan;nan 1.87357;106802 -0.018736;243746 0.152;26168 0.244;37455.5 0.0904447;42007 0.288087;1.30856e+06 0.176275;129253
G3 G1 Undirected 0.334;22729.5 0.0381324;38394 -0.270978;44973 -0.214385;48864 3.2165e-05;61265 nan;nan 0.0028;78346.5 2.27349;70552.5 -0.021059;184389 0.077;91342.5 0.203;48670.5 0.215094;12249 0.388856;608154 0.299126;27713
We note that the final column stores the score of the aggregated network (IRP method). For all future purposes, this is the representative score unless otherwise specified.
Directionality¶
When there are directional algorithms in the aggregated set, seidr will attempt to create a directed aggregated edge as well. The exact procedure used depends on the aggregation algorithm:
-m irp
and-m borda
: When an edge is aggregated with the IRP or Borda schemes, the resulting edge takes the direction of the two-thirds majority of the input set for methods that have called a direction. Example: We aggregate 10 methods. 5 of these methods predict A->B, 2 methods predict A<-B, 3 methods predict A-B. 7 methods call a direction, therefore a 2/3 majority is reached with \(2 \cdot \lfloor\frac{7}{3}\rfloor = 4\). The 5 votes of A->B are therefore taken as the final direction.-m top1
and-m top2
: For these methods the directionality of the selected edges is taken over into the final edge. Fortop2
both methods have to agree, otherwise the edge is set to undirected.
Flags¶
When running seidr aggregate with the -k, --keep
flag, supplementary data with flags for each algorithm will be kept. This suplementary data is a bitwise flag describing the edge. Currently 3 bits are used:
000
^____ Edge exists
^_____ Edge is in direction A->B
^______ Edge is in direction A<-B
Resolving these bits to unsigned integers therefore results in:
0
: Edge does not exist1
: Edge exists and is undirected3
: Edge exists and is A->B5
: Edge exists and is A<-B