- The first row contain column headings.
- The first two column headings must be "Peptide" and "Accession", respectively.
- Subsequent column headings correspond to arrays in your experiment,
and should be labeled in groups of two, with the same name for each, to correspond with the foreground and background intensity readings for a single array.
*For example, in the sample input file, the first array is called "A-1", so the third and fourth column headings are each labeled "A-1".*

- If your arrays contain
*n*technical replicates for each peptide, then each subsequent set of*n*lines (after the header line) must contain the data for those replicates, one replicate per line.*For example, the arrays used to produce the data in the sample input file contained 9 technical replicates, so lines 2-10 of the file contain the data for the first peptide, lines 11-19 contain the data for the second peptide, and so on.*

- Within a line (other than the first line):
- The first two columns contain the peptide name and accession number of the protein corresponding to that peptide, respectively (as suggested by the column headings).
- The next two columns contain the foreground and background intensity values, respectively, for the first array; the next two columns contain these values for the second array, and so on.

The following figure illustrates the use of the above rules using a portion of the sample input file.

What is the best way to make a file like this? That depends on the format of the files produced by your image analysis software. If you are familiar with programming and/or UNIX utilities, you can write a program that takes those files as input and outputs a file in the format specified above. If you do not have this expertise, the best method would probably be to construct the file in Excel (or the spreadsheet program of your choice). To use Excel, follow these general guidelines:

- Open a new Excel file.
- Save the file in
**tab-delimited text (.txt)**format (call the file main_input_file.txt). - Enter "Peptide" in cell A1 and "Accession" in cell B1.
- Import the first file from your image analysis software (corresponding to the first array in your experiment) into Excel as a separate spreadsheet.
- Copy and paste the names of the peptides and the accession numbers from this file into columns A and B, respectively, of main_input_file.txt starting at row 2.
- Copy and paste the foreground and background measurements in this file into columns C and D, respectively, of main_input_file.txt starting at row 2
- Put appropriate column headings in cells C1 and D1.
- Repeat steps 4, 6, and 7 for each array, putting the new data into successive columns, making sure that the order of the peptides are identical for each array.

**Treatment-control combinations file (optional)**---This file specifies the treatment-control combinations. T-tests will be performed for each peptide for each
treatment-control combination specified in this file, and these combinations will also be used for biological subtractions. This file must be a tab-delimited text file, with one treatment-control combination specified per line.
Each line should contain the number corresponding to the treatment, followed by a tab, followed by the number corresponding to the control. This "number" refers to the order of
the treatments as they appear in the main input file. For example, if you were using the sample dataset and set "Number of inter-array replicates" to be 1, then the line "1<tab>2"
in this file would mean a comparison between A-1 and A-2. If, however, "Number of inter-array replicates" is 4, then the line "1<tab>2" would mean a comparison between
subject A (all four samples combined) and subject B (all four samples combined).

The following figure illustrates the features of the example treatment-control combinations file.

**Treatment-control combinations for P-value visualizations file (optional)**--- This file specifies the treatment-control combinations for constructing P-value visualization files. It must be a tab-delimited text file similar to the one described above,
except **two** treatment-control combinations must be specified per line. The first two columns correspond to the first treatment-control combination,
and are used for the left semicircle in each circle in the visualization file, while the third and fourth columns correspond to the second treatment-control combination and
are used for the right semicircle. *Any treatment-control combination listed in this file must also appear in the "Treatment-control combinations file" described above.*

The following figure illustrates the features of the example P-value visualizations file.

**Number of treatments**---The number of unique biological treatments in your experiment. If you do not have any inter-array replicates (either technical or biological), then this will be equal
to the number of arrays. If you do have inter-array technical replicates, then this will be equal to the number of arrays divided by the number of inter-array replicates per treatment. The value of this parameter would be 24 for the sample data.

**Number of unique peptides on the array**---The number of unique peptide sequences on the array. This is equal to the total number of spots on the array divided by the number of intra-array technical replicates per peptide. The value of this parameter would be 297 for the sample data.

**Number of inter-array replicates**---The number of inter-array replicates (either biological or technical) per treatment. The value of this parameter would be 1 for the sample data.

Depending on the nature of your data, by specifying different values for the above parameters, your data will be analyzed in a different way. For example, suppose that for the sample data you choose the following values for the above paramaters rather than the ones specified above:

- Number of technical replicates per unique peptide on the same array: 9
- Number of treatments: 6
- Number of unique peptides on the array: 297
- Number of inter-array replicates: 4 (and choose "biological" in the drop-down box).

**Distance metric for hierarchical clustering**---The distance metric to use when performing hierarchical clustering. Choices are (1 - Pearson correlation) (default) and Euclidean distance.

**Linkage method for hierarchical clustering**---The linkage method to use when performing hierarchical clustering. Choices are McQuitty linkage (default), average linkage, and complete linkage.

**Perform chi-square test?**---If yes, then the chi-square test will be performed to identify peptides with inconsistent phosphorylation patterns among the technical replicates on each array. As a result, PIIKA 2 will output
extra t-test files containing only peptides that are consistently phosphorylated in both the treatment and the control, and also omits from the heatmaps and PCA analyses any peptides that are not consistently phosphorylated on any of the arrays.

**Perform F test?**---If yes, then the F test will be performed to identify peptides with inconsistent phosphorylation patterns among the biological replicates. The implications of this option are analogous to that of the "Perform chi-square test?" option.

**Perform biological subtraction before performing F test?**---If yes, then biological subtraction will be performed on each treatment-control combination before performing the F test.

**Perform random tree analysis?**---If yes, then the analysis described under the heading "Statistical significance of the clustering of a *priori* groups" in the PIIKA 2 paper will be performed. In order for perform
this analysis, your samples must be named such that PIIKA 2 can tell which samples are in the same group. To do this, your sample names (column names in the main input file) must have a hyphen in them, where everything before the hyphen defines
the name of the group, and everything after the hyphen is a number (letters are not allowed) that is unique to that sample. For example, in the same data, the samples corresponding to subject A are labeled "A-1", "A-2", "A-3", and "A-4", and similarly for the other subjects.

**Perform peptide subset analysis?**---If yes, then the analysis described under the heading "Identifying sets of peptides that support the clustering of *a priori* groups" will be performed. For this analysis to work
correctly, the same sample naming format as described above must be used.

**Value of alpha (false positive rate) for statistical significance testing**---The P-value threshold for describing a peptide as differentially phosphorylated between a treatment and a control.

**Estimated background probability that a peptide will be differentially phosphorylated**---This value is used for calculating positive and negative predictive values; see the "Positive and negative predictive values" section of the manuscript describing PIIKA 2 for details.