1 """@namespace IMP.spatiotemporal.create_DAG
2 Simplified function for creating a spatiotemporal model.
15 input_dir=
'', scorestr=
'_scores.log', output_dir=
'',
18 spatio_temporal_rule=
False, subcomplexstr=
'.config',
19 expected_subcomplexes=[],
21 score_comp=
False, exp_comp_map={},
23 out_cdf=
True, out_labeled_pdf=
True, out_pdf=
False, npaths=0,
27 This functions streamlines the process of creating a graph by performing
28 all the necessary steps and saving relevant input to files. Features of
29 this function are walked through in
30 example/toy/Simple_spatiotemporal_example.py
32 @param state_dict: dictionary that defines the spatiotemporal model.
33 The keys are strings that correspond to each time point in the
34 stepwise temporal process. Keys should be ordered according to the
35 steps in the spatiotemporal process. The values are integers that
36 correspond to the number of possible states at that timepoint.
37 Scores for each model are expected to be stored as
38 $state_$timescorestr, where state are integers 1->value of the
39 dictionary, time is the key in the dictionary, and scorestr is
40 trailing characters, which are assumed to be constant for
42 @param input_dir: string, directory where the data is stored. Empty string
43 assumes current working directory.
44 @param scorestr: string, trailing characters at the end of the file with
45 scores for each stage of the spatiotemporal model
46 (default: '_scores.log').
47 @param output_dir: string, directory where the output will be written.
48 Empty string assumes the same directory as the input_dir.
49 @param spatio_temporal_rule: Boolean. If true, enforces that all components
50 earlier in the assembly process are present later in the process.
52 @param subcomplexstr: string, trailing characters after the subcomplex
53 file, which is a list of subcomplexes included in the given
54 label/time (default: '.config')
55 @param expected_subcomplexes: list of all possible subcomplex strings
56 in the model (default: []) Should be a list without duplicates of
57 all components in the subcomplex files.
58 @param score_comp: Boolean to determine whether or not to score models
59 based on the protein composition.
60 @param exp_comp_map: dictionary for determining protein composition score.
61 The keys are the proteins. The code checks if the name of these
62 proteins are within the subcomplex_components for each node.
63 As such, the naming scheme should be such that the keys of
64 exp_comp_map are substrings of expected_subcomplexes the values of
65 exp_comp_map should correspond to a csv file for each subcomplex
66 with protein copy numbers. Each csv file should have 3 columns:
67 1) 'Time' - should correspond to the keys of state_dict,
68 2) 'mean' - mean copy number from experimental data, and
69 3) std - standard deviation from experimental data
70 @param out_cdf: Boolean to determine whether or not to write out the
71 cumulative distribution function (cdf) for the graph
72 (default: True). filename: "cdf.txt"
73 @param out_labeled_pdf: Boolean to determine whether to output the
74 labeled pdf file, which includes both the pdf and the ordered
75 states visited along each path (default: True).
76 filename: "labeled_pdf.txt"
77 @param out_pdf: Boolean to determine whether or not to write out the
78 probability distribution function (pdf) for the graph
79 (default: False) filename: "pdf.txt"
80 @param npaths: int, write out the states along the n most likely paths,
81 based on the pdf (default: 0) filename: "pathXX.txt", where XX
82 is the number of the path
83 @param draw_dag: Boolean to determine whether or not to write out a
84 directed acyclic graph (dag) to a file (default: True)
85 filename: "dag_heatmap"
86 @return nodes: list of graphNode objects, corresponding to the snapshot
87 models in the spatiotemporal model
88 @return graph: list of all paths through the graph. Each path is a list
89 of graphNode objects that correspond to the states visited
91 @return graph_prob: list of probabilities for each path, ordered in the
92 same order as all_paths
93 @return graph_scores: list of tuples, where the first object is the
94 path (list of graphNode objects for each state along the
95 trajectory), and the second object is the score of the path,
96 which can be used to calculate the probability.
103 labeled_pdf_fn =
'labeled_pdf.txt'
110 dag_fn =
'dag_heatmap'
118 dag_colormap =
"Purples"
121 dag_draw_label =
True
123 dag_fontname =
"Helvetica"
136 if not isinstance(state_dict, dict):
137 raise TypeError(
"state_dict should be of type dict")
138 if not isinstance(input_dir, str):
139 raise TypeError(
"input_dir should be of type str")
140 if not isinstance(scorestr, str):
141 raise TypeError(
"scorestr should be of type str")
142 if not isinstance(spatio_temporal_rule, bool):
143 raise TypeError(
"state_dict should be of type bool")
144 if not isinstance(subcomplexstr, str):
145 raise TypeError(
"subcomplexstr should be of type str")
146 if not isinstance(expected_subcomplexes, list):
147 raise TypeError(
"expected_subcomplexes should be of type list")
148 if not isinstance(score_comp, bool):
149 raise TypeError(
"score_comp should be of type bool")
150 if not isinstance(exp_comp_map, dict):
151 raise TypeError(
"exp_comp_map should be of type dict")
152 if not isinstance(out_cdf, bool):
153 raise TypeError(
"out_cdf should be of type bool")
154 if not isinstance(out_labeled_pdf, bool):
155 raise TypeError(
"out_labeled_pdf should be of type bool")
156 if not isinstance(out_pdf, bool):
157 raise TypeError(
"out_pdf should be of type bool")
158 if not isinstance(npaths, int):
159 raise TypeError(
"npaths should be of type int")
160 if not isinstance(draw_dag, bool):
161 raise TypeError(
"draw_dag should be of type bool")
164 for key
in exp_comp_map.keys():
166 for subcomplex
in expected_subcomplexes:
167 if key
in subcomplex:
171 'WARNING!!! Check exp_comp_map and expected_subcomplexes. '
172 'protein ' + key +
' is not found in expected_subcomplexes. '
173 'This could cause illogical results.')
177 print(
'Initialing graph...')
180 keys = list(state_dict.keys())
182 if len(input_dir) > 0:
183 if os.path.exists(input_dir):
187 "Error!!! Does not exist: " + input_dir +
'\nClosing...')
191 for i
in range(state_dict[key]):
194 node.init_graphNode(key, str(index), scorestr, subcomplexstr,
195 expected_subcomplexes)
199 tpairs = [(keys[i], keys[i + 1])
for i
in range(0, len(keys) - 1)]
202 anode = [n
for n
in nodes
if n.get_time() == a]
203 bnode = [n
for n
in nodes
if n.get_time() == b]
206 for na, nb
in itertools.product(anode, bnode):
207 graphNode.draw_edge(na, nb, spatio_temporal_rule)
210 for ni, node
in enumerate(nodes):
216 print(
'Calculation composition likelihood...')
217 nodes = composition_scoring.calc_likelihood(exp_comp_map, nodes)
221 print(
'Scoring directed acycling graph...')
222 graph, graph_prob, graph_scores = score_graph(nodes, keys)
226 print(
'Writing output...')
228 if len(output_dir) > 0:
229 if os.path.exists(output_dir):
234 write_output.write_cdf(out_cdf, cdf_fn, graph_prob)
235 write_output.write_pdf(out_pdf, pdf_fn, graph_prob)
236 write_output.write_labeled_pdf(out_labeled_pdf, labeled_pdf_fn, graph,
238 write_output.write_final_npaths(npaths, npath_fn, graph_scores, graph_prob)
240 write_output.draw_dag(
241 dag_fn, nodes, graph, graph_prob, keys, heatmap=dag_heatmap,
242 colormap=dag_colormap, draw_label=dag_draw_label,
243 fontname=dag_fontname, fontsize=dag_fontsize,
244 penscale=dag_penscale, arrowsize=dag_arrowsize, height=dag_height,
248 return nodes, graph, graph_prob, graph_scores
Functions to traverse and score the spatiotemporal graphs.
Spatialtemporal scoring in IMP.
def create_DAG
This functions streamlines the process of creating a graph by performing all the necessary steps and ...
The general base class for IMP exceptions.
A class to represent a node in a spatiotemporal process.