The following figure illustrates the use of the above rules using a portion of the sample input file.
What is the best way to make a file like this? That depends on the format of the files produced by your image analysis software. If you are familiar with programming and/or UNIX utilities, you can write a program that takes those files as input and outputs a file in the format specified above. If you do not have this expertise, the best method would probably be to construct the file in Excel (or the spreadsheet program of your choice). To use Excel, follow these general guidelines:
Treatment-control combinations file (optional)---This file specifies the treatment-control combinations. T-tests will be performed for each peptide for each treatment-control combination specified in this file, and these combinations will also be used for biological subtractions. This file must be a tab-delimited text file, with one treatment-control combination specified per line. Each line should contain the number corresponding to the treatment, followed by a tab, followed by the number corresponding to the control. This "number" refers to the order of the treatments as they appear in the main input file. For example, if you were using the sample dataset and set "Number of inter-array replicates" to be 1, then the line "1<tab>2" in this file would mean a comparison between A-1 and A-2. If, however, "Number of inter-array replicates" is 4, then the line "1<tab>2" would mean a comparison between subject A (all four samples combined) and subject B (all four samples combined).
The following figure illustrates the features of the example treatment-control combinations file.
Treatment-control combinations for P-value visualizations file (optional)--- This file specifies the treatment-control combinations for constructing P-value visualization files. It must be a tab-delimited text file similar to the one described above, except two treatment-control combinations must be specified per line. The first two columns correspond to the first treatment-control combination, and are used for the left semicircle in each circle in the visualization file, while the third and fourth columns correspond to the second treatment-control combination and are used for the right semicircle. Any treatment-control combination listed in this file must also appear in the "Treatment-control combinations file" described above.
The following figure illustrates the features of the example P-value visualizations file.
Number of treatments---The number of unique biological treatments in your experiment. If you do not have any inter-array replicates (either technical or biological), then this will be equal to the number of arrays. If you do have inter-array technical replicates, then this will be equal to the number of arrays divided by the number of inter-array replicates per treatment. The value of this parameter would be 24 for the sample data.
Number of unique peptides on the array---The number of unique peptide sequences on the array. This is equal to the total number of spots on the array divided by the number of intra-array technical replicates per peptide. The value of this parameter would be 297 for the sample data.
Number of inter-array replicates---The number of inter-array replicates (either biological or technical) per treatment. The value of this parameter would be 1 for the sample data.
Depending on the nature of your data, by specifying different values for the above parameters, your data will be analyzed in a different way. For example, suppose that for the sample data you choose the following values for the above paramaters rather than the ones specified above:
Distance metric for hierarchical clustering---The distance metric to use when performing hierarchical clustering. Choices are (1 - Pearson correlation) (default) and Euclidean distance.
Linkage method for hierarchical clustering---The linkage method to use when performing hierarchical clustering. Choices are McQuitty linkage (default), average linkage, and complete linkage.
Perform chi-square test?---If yes, then the chi-square test will be performed to identify peptides with inconsistent phosphorylation patterns among the technical replicates on each array. As a result, PIIKA 2 will output extra t-test files containing only peptides that are consistently phosphorylated in both the treatment and the control, and also omits from the heatmaps and PCA analyses any peptides that are not consistently phosphorylated on any of the arrays.
Perform F test?---If yes, then the F test will be performed to identify peptides with inconsistent phosphorylation patterns among the biological replicates. The implications of this option are analogous to that of the "Perform chi-square test?" option.
Perform biological subtraction before performing F test?---If yes, then biological subtraction will be performed on each treatment-control combination before performing the F test.
Perform random tree analysis?---If yes, then the analysis described under the heading "Statistical significance of the clustering of a priori groups" in the PIIKA 2 paper will be performed. In order for perform this analysis, your samples must be named such that PIIKA 2 can tell which samples are in the same group. To do this, your sample names (column names in the main input file) must have a hyphen in them, where everything before the hyphen defines the name of the group, and everything after the hyphen is a number (letters are not allowed) that is unique to that sample. For example, in the same data, the samples corresponding to subject A are labeled "A-1", "A-2", "A-3", and "A-4", and similarly for the other subjects.
Perform peptide subset analysis?---If yes, then the analysis described under the heading "Identifying sets of peptides that support the clustering of a priori groups" will be performed. For this analysis to work correctly, the same sample naming format as described above must be used.
Value of alpha (false positive rate) for statistical significance testing---The P-value threshold for describing a peptide as differentially phosphorylated between a treatment and a control.
Estimated background probability that a peptide will be differentially phosphorylated---This value is used for calculating positive and negative predictive values; see the "Positive and negative predictive values" section of the manuscript describing PIIKA 2 for details.