1 """@namespace IMP.spatiotemporal.composition_scoring
2 Functions for weighting graphNode objects based on stoichiometry data.
9 def get_state(subcomplex_components, prot):
11 function to calculate how many times a protein appears in a list of
12 proteins, which can be accessed from a graphNode object using
13 node.get_subcomplex_components()
15 @param subcomplex_components: subcomplexes or components in a given node,
16 which can be accessed by graphNode.get_subcomplex_components()
17 @param prot: string, protein or subcomplex we are interested in finding
18 @return state, int, number of times the protein or subcomplex appears
19 in subcomplex_components
22 for subcomplex
in subcomplex_components:
23 if prot
in subcomplex:
29 """Function that calculates the likelihood of an individual node, used by
32 @param mean: dictionary of dictionaries where the first key is the protein,
33 the second key is the time, and the expected mean copy number
34 from experiment is returned.
35 @param std: dictionary of dictionaries where the first key is the protein,
36 the second key is the time, and the expected standard deviation
37 of protein copy number from experiment is returned.
38 @param prots: list of proteins or subcomplexes which will be scored
39 according to this likelihood function
40 @param node: the graphNode object for which the likelihood will be
42 @return w: float, the weight of the graphNode according to the composition
50 x =
get_state(node.get_subcomplex_components(), prot)
56 'WARNING!!! Standard deviation of protein ' + prot
57 +
' 0 or less at time ' + t
58 +
'. May lead to illogical results.')
59 w += (0.5 * ((x - mean[prot][t]) / std[prot][t])**2
60 + np.log(std[prot][t] * np.sqrt(2 * np.pi)))
66 Function that adds a score for the compositional likelihood for all
67 states represented as nodes in the graph. The composition likelihood
68 assumes a Gaussian distribution for copy number of each protein or
69 subcomplex with means and standard deviatiations derived from experiment.
70 Returns the nodes, with the new weights added.
72 @param exp_comp_map: dictionary, which describes protein stoicheometery.
73 The key describes the protein, which should correspond to names
74 within the expected_subcomplexes. Only copy numbers for proteins
75 or subcomplexes included in this dictionary will be scored. For
76 each of these proteins, a csv file should be provided with protein
77 copy number data. The csv file should have 3 columns,
78 1) "Time", which matches up to the possible times in the graph,
79 2) "mean", the average protein copy number at that time point
80 from experiment, and 3) "std", the standard deviation of that
81 protein copy number from experiment.
82 @param nodes: list of graphNode objects, which have been already been
83 initiated with static scores
84 @return nodes: edited list of graphNode objects, which now have static
85 and composition scores
89 prots = list(exp_comp_map.keys())
100 if os.path.exists(exp_comp_map[prot]):
101 exp = pd.read_csv(exp_comp_map[prot])
103 raise FileNotFoundError(
104 "Error!!! Check exp_comp_map. Unable to find composition "
105 "file: " + exp_comp_map[prot] +
'\nClosing...')
106 for i
in range(len(exp)):
107 prot_dict_mean[exp[
'Time'][i]] = exp[
'mean'][i]
108 prot_dict_std[exp[
'Time'][i]] = exp[
'std'][i]
109 mean[prot] = prot_dict_mean
110 std[prot] = prot_dict_std
116 node.add_score(float(weight))
122 Function that adds a score for the compositional likelihood for all
123 states, similar to how composition_likelihood_function calculates the
124 composition likelihood of a node. Used by prepare_protein_library.
125 The composition likelihood assumes a Gaussian distribution for copy
126 number of each protein or subcomplex with means and standard
127 deviatiations derived from experiment. Returns the nodes, with the
130 @param exp_comp_map: dictionary, which describes protein stoicheometery.
131 The key describes the protein, which should correspond to names
132 within the expected_subcomplexes. Only copy numbers for proteins
133 or subcomplexes included in this dictionary will be scored. For
134 each of these proteins, a csv file should be provided with protein
135 copy number data. The csv file should have 3 columns,
136 1) "Time", which matches up to the possible times in the graph,
137 2) "mean", the average protein copy number at that time point
138 from experiment, and 3) "std", the standard deviation of that
139 protein copy number from experiment.
140 @param t: string, time at which the composition likelihood should be
141 calculated. Should match one a possible value in the first column
143 @param state: list of integers, an array of the number of protein copy
144 numbers for which the likelihood will be calculated.
145 This array should list the proteins in the same order as
147 @return weight: float, the weight of the graphNode according to the
148 composition likelihood function.
160 for prot
in exp_comp_map.keys():
163 state_cn[prot] = state[count]
164 if os.path.exists(exp_comp_map[prot]):
165 exp = pd.read_csv(exp_comp_map[prot])
167 raise FileNotFoundError(
168 "Error!!! Check exp_comp_map. Unable to find composition "
169 "file: " + exp_comp_map[prot] +
'\nClosing...')
170 for i
in range(len(exp)):
171 prot_dict_mean[exp[
'Time'][i]] = exp[
'mean'][i]
172 prot_dict_std[exp[
'Time'][i]] = exp[
'std'][i]
173 mean[prot] = prot_dict_mean
174 std[prot] = prot_dict_std
178 for prot
in exp_comp_map.keys():
186 'WARNING!!! Standard deviation of protein ' + prot
187 +
' 0 or less at time ' + t
188 +
'. May lead to illogical results.')
189 weight += (0.5 * ((x - mean[prot][t]) / std[prot][t]) ** 2 +
190 np.log(std[prot][t] * np.sqrt(2 * np.pi)))
def composition_likelihood_function
Function that calculates the likelihood of an individual node, used by calc_likelihood().
def get_state
function to calculate how many times a protein appears in a list of proteins, which can be accessed f...
def calc_likelihood
Function that adds a score for the compositional likelihood for all states represented as nodes in th...
def calc_likelihood_state
Function that adds a score for the compositional likelihood for all states, similar to how compositio...