matrix2png

home

Synopsis:

USAGE: matrix2png 
 -data <file> (required)
 -desctext <file>
 -range : values assigned to mincolor and maxcolor as min:max (default is data range)
 -con : contrast (default = 1.0; applies only when not using -range option)
 -size : pixel dimensions per value as x:y (default = 2:2)
 -numcolors : number of colors (default = 64)
 -minsize : minimum image size as x:y pixels
 -mincolor : color used at lowest value (name or r:g:b triplet) (default = blue)
 -maxcolor : color used at highest value (name or r:g:b triplet) (default = red)
 -bkgcolor : color used as background (name or r:g:b triplet) (default = white)
 -missingcolor : color used for missing values (name or r:g:b triplet) (default = grey)
 -map : color choices from preset maps: overrides min/max colors and -b (default = 0 (none) )
 -discrete: Use discretized mapping of values to colors; use -dmap to assign a mapping file
 -dmap  : Discrete color mapping file to use for discrete mapping (default = preset)
 -numr : Number of rows to process starting from the top of the matrix by default
 -numc : Number of columns to process starting from the left edge of the matrix by default
 -startrow : Index of the first row to be processed; can combine with numr (default=1)
 -startcol : Index of the first column to be processed; can combine with numc (default=1)
 -trim : Trim this percent of data extremes when determining data range (only without the -range option)
 -verbose : Verbosity of the output 1|2|3|4|5 (default=2)
 -title <title>: Add a title
 -z : Row-normalize the data to mean 0 and variance 1
 -b : Middle of color range is black
 -d : Add cell dividers
 -s : Add scale bar
 -r : Add row names
 -c : Add column names
 -f : Data file has a format line
 -e : Draw ellipses instead of rectangles
 -l : Log transform the data (base 2)
 -u : Put the column labels under the picture instead of above (you must also set -c, or this is ignored)
 -g : Put the row labels to the left instead of the right (you must also set -r, or this is ignored)

Description/overview:

This program converts tab-delmited matrix files into png images. It is implemented in ANSI C and utilized Tom Boutell's gd library (which in turn uses libpng and zlib). It is designed to be called from the command line or within a script.

Example: say testdata.rdb contains a matrix of size 25x79. Running:

matrix2png -data testdata.rdb -range -2:2 -mincolor red -maxcolor green -bf -size 4:3 > test.png
yields the following image:



Inputs:

-data <file>: the only required input. At the minimum, the name of a tab/newline-delimited file containing numerical values to be plotted; alternatively, entering '-' (dash) will read from standard input. The results with the default parameters are unlikely to be satisfactory, so you will want to set some options. (It is assumed that there is one header and one format row, which can be blank, and a single column of row labels. Details on the file format are available. The file format description is generic, as much of the software from our group uses it; the description linked above says you can't have missing values, but for matrix2png you can).

Missing values:

The current version of matrix2png handles missing values in the data. Although data entries can be missing, row or column label entries cannot. A missing value can be signified in one of three ways. Either the cell is empty (so that three tab characters appear in a row), or the cell contains a single space (' '), or the cell contains a dash ('-'). In the image, missing values are displayed in grey by default. The missing value color can be set by the user.

Outputs:

To standard out, a png format image file. For details on the png format, use this link

Quick tips for making microarray data set images

Overview of some features

These sections give an overview of how the software works and the relevant options. See the options section for full details of how to use each option.

Continuous color mapping to values

The default behavior of matrix2png is designed for continuous-valued data, and values are mapped to a (simulated) continuous color range: The lowest numerical value is mapped to one color (defined by -mincolor), and the highest values to another color (defined by -maxcolor). Values in between are mapped to colors interpolated from these endpoints. Thus if our colors are going from white to black, and the data ranges from -2 to 4, -2 will be shown as white, 4 will be shown as black; values in between will be varying shades of grey, so values like -11 will be light grey, values like 3.2 will be dark grey.

The central value of the color range is mapped to (maxvalue + minvalue)/2. In our example, the middle will be 1 (i.e. (4 + -2)/2 ) and would be shown as medium grey. To force the middle value to be zero, use the -range option to make the distribution symmetric, i.e. -range -4:4.

Setting the range with the -range option can cause a variety of effects, either compressing or spreading the visualized range. Thus if in our example we set the range as -100:100, because our data ranges only from -2 to 4, all the numerical values will be shown as colors near grey. Thus setting the range to be wider than our actual data range reduces the contrast of the image. Similarly, setting the range to be narrower has the usually more desirable effect of increasing the contrast. You can increase the contrast without setting the range by using the -con option.

Besides the minimum and maximum colors, you can force the colors to run through a selected color using the -midcolor option. An older option, to force the map to pass through black in the middle, is the -b option and is retained for backwards compatibility. More complicated color mappings can be defined using the -map option.

Normalizing your data: If you use the -z option, each row of your matrix will be centered and the spread adjusted (mean zero, variance 1). This can be very desirable if the range of values in your data is both very wide and not the same for each row, a situation that often arises when using microarray data.

For more information, see the -mincolor, -maxcolor, -midcolor, -map, -b, -range, and -trim options, below.

Discrete color mapping to values

Matrix2png can be used for data containing discrete integer values where mapping is irrespective of any value order. This can be useful for data which is categorical in nature. This is invoked using the -discrete option; a specific color-to-value mapping can be set with the -dmap option. Thus you could map 0 to orange, 1 to blue, 2 to green, 3 to red, 4 to grey etc. Values in your data file which are not found in your map (i.e., not small integers) will be assigned a default color value (user-settable). See details below and here.

Accessing submatrices and side effects thereof:

The -numr, -numc, -startrow, and -startcol options (described in detail below) are used to draw only part of a file. When this is done, all range calculations are done just on the basis of the data read, not the entire file. Some useful side effects can be triggered with these options. If you select zero rows or zero columns (e.g., -numr 0), a blank image will be drawn, unless you choose to draw column or row labels using the '-c' or '-r' options. This allows you to draw pictures that contain just the row or column labels, which can be useful in certain situations; alternatively you can use '-numr 0 -numc 0 -s' to get just a scale bar.

Colors

Colors are selected either by name (see below) or by red:green:blue values (ranging from 0 to 255). Thus pure red is indicated by 255:0:0 while medium grey would be 128:128:128. This way, on the command line you could use the arguments "-mincolor blue" or equivalently "-mincolor 0:0:255". The same convention is used for discrete mapping files.

Options:

Bugs, known problems:

Please send notice of any undocumented bugs or feature requests to pavlab-support@msl.ubc.ca. Some known problems/limitations:

Author:

Paul Pavlidis (also uses code by William Noble)