matrix2png

Synopsis:

USAGE: matrix2png 
 -data <file> (required)
 -desctext <file>
 -range : values assigned to mincolor and maxcolor as min:max (default is data range)
 -con : contrast (default = 1.0; applies only when not using -range option)
 -size : pixel dimensions per value as x:y (default = 2:2)
 -numcolors : number of colors (default = 64)
 -minsize : minimum image size as x:y pixels
 -mincolor : color used at lowest value (name or r:g:b triplet) (default = blue)
 -maxcolor : color used at highest value (name or r:g:b triplet) (default = red)
 -bkgcolor : color used as background (name or r:g:b triplet) (default = white)
 -missingcolor : color used for missing values (name or r:g:b triplet) (default = grey)
 -map : color choices from preset maps: overrides min/max colors and -b (default = 0 (none) )
 -discrete: Use discretized mapping of values to colors; use -dmap to assign a mapping file
 -dmap  : Discrete color mapping file to use for discrete mapping (default = preset)
 -numr : Number of rows to process starting from the top of the matrix by default
 -numc : Number of columns to process starting from the left edge of the matrix by default
 -startrow : Index of the first row to be processed; can combine with numr (default=1)
 -startcol : Index of the first column to be processed; can combine with numc (default=1)
 -trim : Trim this percent of data extremes when determining data range (only without the -range option)
 -verbose : Verbosity of the output 1|2|3|4|5 (default=2)
 -title <title>: Add a title
 -z : Row-normalize the data to mean 0 and variance 1
 -b : Middle of color range is black
 -d : Add cell dividers
 -s : Add scale bar
 -r : Add row names
 -c : Add column names
 -f : Data file has a format line
 -e : Draw ellipses instead of rectangles
 -l : Log transform the data (base 2)
 -u : Put the column labels under the picture instead of above (you must also set -c, or this is ignored)
 -g : Put the row labels to the left instead of the right (you must also set -r, or this is ignored)

Description/overview:

This program converts tab-delmited matrix files into png images. It is implemented in ANSI C and utilized Tom Boutell's gd library (which in turn uses libpng and zlib). It is designed to be called from the command line or within a script.

Example: say testdata.rdb contains a matrix of size 25x79. Running:

matrix2png -data testdata.rdb -range -2:2 -mincolor red -maxcolor green -bf -size 4:3 > test.png

yields the following image:

Inputs:

-data <file>: the only required input. At the minimum, the name of a tab/newline-delimited file containing numerical values to be plotted; alternatively, entering '-' (dash) will read from standard input. The results with the default parameters are unlikely to be satisfactory, so you will want to set some options. (It is assumed that there is one header and one format row, which can be blank, and a single column of row labels. Details on the file format are available. The file format description is generic, as much of the software from our group uses it; the description linked above says you can't have missing values, but for matrix2png you can).

Missing values:

The current version of matrix2png handles missing values in the data. Although data entries can be missing, row or column label entries cannot. A missing value can be signified in one of three ways. Either the cell is empty (so that three tab characters appear in a row), or the cell contains a single space (' '), or the cell contains a dash ('-'). In the image, missing values are displayed in grey by default. The missing value color can be set by the user.

Outputs:

To standard out, a png format image file. For details on the png format, use this link

Quick tips for making microarray data set images

You will probably want to turn on the -z option, which normalizes the rows of your matrix to have a mean 0, variance 1. (It won't hurt if you've already done this to your data).
You may wish to set -trim with a value of 1 to 5, which will cause the color range to ignore extreme values. This can improve the image quality.
Set the -range option; we find a range of -2:2 works well in most cases.
If you are using ratiometric arrays (cDNA arrays), try map 4 for the "Eisen" color map, or maps 3 and 6-10.
For Affymetrix data, we like map 1 or try maps 11-15. You can use the negative value of the map to reverse the direction.
To show row and/or column labels, be sure to set the -size so the blocks are at least 8 pixels high and/or wide (that is, for both row and column labels, you must have -size 8:8 at least).

Overview of some features

These sections give an overview of how the software works and the relevant options. See the options section for full details of how to use each option.

Continuous color mapping to values

The default behavior of matrix2png is designed for continuous-valued data, and values are mapped to a (simulated) continuous color range: The lowest numerical value is mapped to one color (defined by -mincolor), and the highest values to another color (defined by -maxcolor). Values in between are mapped to colors interpolated from these endpoints. Thus if our colors are going from white to black, and the data ranges from -2 to 4, -2 will be shown as white, 4 will be shown as black; values in between will be varying shades of grey, so values like -11 will be light grey, values like 3.2 will be dark grey.

The central value of the color range is mapped to (maxvalue + minvalue)/2. In our example, the middle will be 1 (i.e. (4 + -2)/2 ) and would be shown as medium grey. To force the middle value to be zero, use the -range option to make the distribution symmetric, i.e. -range -4:4.

Setting the range with the -range option can cause a variety of effects, either compressing or spreading the visualized range. Thus if in our example we set the range as -100:100, because our data ranges only from -2 to 4, all the numerical values will be shown as colors near grey. Thus setting the range to be wider than our actual data range reduces the contrast of the image. Similarly, setting the range to be narrower has the usually more desirable effect of increasing the contrast. You can increase the contrast without setting the range by using the -con option.

Besides the minimum and maximum colors, you can force the colors to run through a selected color using the -midcolor option. An older option, to force the map to pass through black in the middle, is the -b option and is retained for backwards compatibility. More complicated color mappings can be defined using the -map option.

Normalizing your data: If you use the -z option, each row of your matrix will be centered and the spread adjusted (mean zero, variance 1). This can be very desirable if the range of values in your data is both very wide and not the same for each row, a situation that often arises when using microarray data.

For more information, see the -mincolor, -maxcolor, -midcolor, -map, -b, -range, and -trim options, below.

Discrete color mapping to values

Matrix2png can be used for data containing discrete integer values where mapping is irrespective of any value order. This can be useful for data which is categorical in nature. This is invoked using the -discrete option; a specific color-to-value mapping can be set with the -dmap option. Thus you could map 0 to orange, 1 to blue, 2 to green, 3 to red, 4 to grey etc. Values in your data file which are not found in your map (i.e., not small integers) will be assigned a default color value (user-settable). See details below and here.

Accessing submatrices and side effects thereof:

The -numr, -numc, -startrow, and -startcol options (described in detail below) are used to draw only part of a file. When this is done, all range calculations are done just on the basis of the data read, not the entire file. Some useful side effects can be triggered with these options. If you select zero rows or zero columns (e.g., -numr 0), a blank image will be drawn, unless you choose to draw column or row labels using the '-c' or '-r' options. This allows you to draw pictures that contain just the row or column labels, which can be useful in certain situations; alternatively you can use '-numr 0 -numc 0 -s' to get just a scale bar.

Colors

Colors are selected either by name (see below) or by red:green:blue values (ranging from 0 to 255). Thus pure red is indicated by 255:0:0 while medium grey would be 128:128:128. This way, on the command line you could use the arguments "-mincolor blue" or equivalently "-mincolor 0:0:255". The same convention is used for discrete mapping files.

Options:

-range <minvalue>:<maxvalue> : The range of colors that represent the values by default correspond to the range of values in the data set. By setting -range, you can change this behavior. For example: -range -2:2 results in the 'clipping' of values which are less than -2 of greater than 2. Such values are assigned mincolor and maxcolor, respectively. It is important to note that this is the only way to ensure that your color range is 'symmetric' around a central value.. Default: This is not set, and the actual range of values in the input matrix is used at the range.
-con <value> : Set a contrast value. This only applies when using the data range (i.e, if -range is not set). It causes the range to be changed by a factor equal the entered value. Values greater than 1 increase contrast while values less than 1 (but greater than 0) decrease it. Thus, data which spans a range of -10 to 10, when used with a contrast of 2, will result in clipping below -5 and above 5, and the rest of the range is expanded accordingly. Default value: 1.
-size <xsize>:<ysize> : The size of each rectangle representing a value, in pixels. For eample, -size 1:1 results in each value being represented by a single pixel in the output image. Larger images take longer to generate, so if scaling the image in a browser afterwards will work for your application, that is preferable to generating a large output. Default value: 2 x 2.
-numcolors <value>: How many colors will be represented in the scale. The legal range in the current implementation is 2-250 (pngs can represent up to 256 colors, but a few colors are reserved by the program for its own use). Using more colors increases file size. In practice, using more than 64 colors is unlikely to result in much of a visible difference in the images, but decreasing this value may be useful for getting particular effects. Default: 64.
-mincolor <value> (ignored if using "-map"), -maxcolor (ignored if using "-map"), -bkgcolor: The colors which are used for the image scale. The smallest value in the image (or values less than or equal to the minvalue) are represented by mincolor, same idea for maxcolor. See also midcolor.
The background color is used for areas of the image not covered by the matrix data - which means that you may not see the background color at all. Colors can be selected by name or by red:green:blue triplets (values ranging from 0 to 255). Thus pure red is indicated by 255:0:0 while medium grey would be 128:128:128.

Colors which can be selected by name are:

red darkred blue darkblue green darkgreen yellow magenta cyan black white grey (or gray) orange violet

Other colors are generated by interpolating between the min and max colors; by using color maps (below), you can use preset mappings that are more complex. Defaults: mincolor=blue; maxcolor=red; bkgcolor = white.

-map <map number>: Overrides mincolor and maxcolor to use a preset color mapping. The current choices are illustrated here, along with some more details about how this option operates. Currently accepted values are from 1 to 15 to access 15 different presets; use the negative of the value to specify that the direction of mapping should be reversed from the default.

-missingcolor <value>: The color used to signify missing values. This is grey by default.

-discrete: This option overrides the normal color selection scheme (where data consting of decimal values are smoothly mapped to colors) in favor of a fully described set of values and their corresponding colors. Details are here.

-dmap <color map file>: Assign a discrete mapping (-discrete is implicitly invoked if -dmap is used). Details are here.

-minsize <minx>:<miny> : The minimum image size. The matrix is centered in the image. Making this larger than the expected image size will yield a border around your image (usually setting this is not necessary). Default: same as the end image size, which is only as large as necessary to fit everything.

-desctext <filename> : A file which contains descriptive text which will be added along the Y axis (to each row). In the current implementation, it is assumed that the file contains the right number of items in the right order. The text appears at the right edge.

-trim <percent>: Trim outlying values from consideration of the data range for color assignment. This operates symmetrically on the data, so the top and bottom of the data are ignored as outliers. This can improve image quality when using data that contains outliers. Otherwise, extreme values can cause the map to become very compressed for the majority of the data.

-title <title>: Add a title to the top of the figure. The title appears in larger text and is centered at the top of the image.

-f: Skip the second line in the file (such as an RDB format line). The default is to use the second line as data.

-midcolor <color> and -b: Force the color range to run through the selected color (midcolor) or black (-b) (i.e., from mincolor to midcolor, then from midcolor to maxcolor). These options are ignored if using "map"; "-b" is overrides "-midcolor". The default behavior is to go smoothly from mincolor to maxcolor. Setting this is usually best if not using a map and the min and max colors are not black. Important: setting this this does not mean that black signifies zero. It only ensures that it is the middle value of the range. If you want to ensure that the middle value is a particular number, use the -range x1:x2 option, with x1 = -x2, then the middle value will be zero. For example, use

-range
-2:2

-r, -c: Include row and/or column labels. These are placed at the right and top of the image, respectively. The xsize and ysize must be set to be large enough to allow the text to fit along each row and/or column: This should be 8 pixels. By default there are no text labels.

-d: Include a 1 pixel grey divider between each row and column. By default there is no dividers between the rows and columns.

-s: Include a scale bar. The scale bar is placed at the upper left and is labeled with the values representing the range it represents. By default there is no scale bar shown.

-numr <value> and -numc <value>: For these, an integer indicating how many rows (numr) or columns (numc) of the file should be drawn, always starting from the top left position in the matrix (i.e, the first numr and numc columns are drawn). By default the entire file is used. Note that the color range is calculated using only the rows and columns selected.

-startrow <value>, -startcol <value>: Integer values indicating which row or column to start reading from. The first row and column of data are numbered '1'. Use in combination with -numr and -numc to access submatrices of the data.

-e: Draw ellipses (circles if the xsize is the same as the ysize) instead of rectangular blocks. The background color will show between the two. Currently this yields slightly ugly results with the -d (dividers) option. Rectangles are used by default.

-z: Normalize the rows of the matrix to have a mean of zero and variance one (Z transformation).

-l: Log-transform the data (base 2). This is done before normalizing rows, if requested.

Bugs, known problems:

Please send notice of any undocumented bugs or feature requests to pavlab-support@msl.ubc.ca. Some known problems/limitations:

Some combinations of options give ugly results. In particular, using -e with -d is unattractive.
Very large pngs may not display properly in some web browsers. This appears to affect Netscape and Opera. The maximum seems to be around 40000 pixels high for Opera. Of course, such large images are not often useful, but just beware if you make a very large image and have problems viewing it.
The text is not terribly attractive: good enough for the web, but not publication quality. libgd can work with truetype fonts, but even that isn't likely to solve the problem (because it will still be bitmapped). For publication quality figures, we recommend using matrix2png to make the image alone, and add text labels using Adobe Illustrator or a similar image editing program.

Author:

Paul Pavlidis (also uses code by William Noble)