Diagonal reference manual

1 Overview
- 1.1 Schema
- 1.2 Advantages
2 System requirements
3 Invocation
4 Design
- 4.1 No dependency at run time
- 4.2 Some items from The UNIX Philosophy:

1 Overview

Please read this section if you wonder what can be done with diagonal.

1.1 Schema

Diagonal is for solving various type of problems represented with the following schema:

                       |input
                       v
        +-------------------------------+
        |      ....                     |
        |    +------------+             |
        | i->|your program|->o          |
        |    +------------+             |
        |         ....                  |
        |       +------------+          |
        |    i->|your program|->o       |
        |       +------------+          |
diagonal|            ....               |
        |          +------------+       |
        |       i->|your program|->o    |
        |          +------------+       |
        |               ....            |
        |             +------------+    |
        |          i->|your program|->o |
        |             +------------+    |
        |                   ....        |
        +-------------------------------+
                       |output
                       v

All you have to do is to specify your problem by preparing a program taking input from stdin to produce output to stdout, then the rest for its solution is done by diagonal.

1.2 Advantages

Diagonal is originally designed by a lazy programmer who wants to take advantage of software leverage with a little effort.

You can define your problem in your favorite programming language.
Only a little knowledge about the solution of the problem is needed.
You can forget tedious details to handle computational resources such as temporary files.

2 System requirements

Diagonal just works on a POSIX¹ system like FreeBSD, Linux, Mac OS X, NetBSD and so on. At the same time, it also supports modern Windows OS like Windows 7/8 via MinGW-w64².

3 Invocation

Diagonal is a command line tool with a set of subcommands. This section explains all available subcommands one by one.

3.1 diag cycle

diag cycle [-O output] [-i input] command [operand ...]

Execute a command with optional operand ... as a funcion, which takes input from stdin to produce output to stdout. If the sequence of its iterated function values has a cycle, then it exits successfully after printing both the loop length and the smallest number of the repetition such that the output reappears infinitely many times within the sequence. The initial input is either input if specified, or stdin otherwise. The output to stdout of each run will be saved as a file diagonal*.out in an output directory if given, as well as saving the stderr as another file diagonal*.err. Note that it will run into an infinite loop when failing to find a cycle. The exit status will be 0 in case of success, otherwise non-0.

3.2 diag decode

diag dec [-s source] [-t target] [file]

Decode a file (or stdin) encoded in the VCDIFF data format defined in RFC 3284³. Optionally a source is used for decoding. The decoded content will be written into target if given, otherwise it goes to stdout. The exit status will be 0 in case of success, otherwise non-0.

3.3 diag file

diag file [-m metric] [-I initial] [-F final] [-1] file [...]

diag file [-m metric] [-I initial] [-F final] [-1] -i input

Classify files by a single-linkage clustering with a metric, and print the resulting clusters. Input files appears in file ... for the former case, while the latter case assumes each line of input contains the path of an input file. Possible values of metric are hamming (default), levenshtein, hash32, and hash32_rev. Optional initial and final are positive numbers which the similarity is in between. Singleton clusters will be omitted for the result unless -1 is given. The exit status will be 0 in case of success, otherwise non-0.

3.4 diag fix

diag fix [-i input] command [operand ...]

Execute a command with optional operand ..., and compare the stdout with the stdin. The process exits successfully if it reaches a fixed point i.e. the output reproduces the input. Otherwise, run the same command again with feeding the previous output into the stdin; continue until reaching a fixed point. The initial input is either input if specified, or stdin otherwise. The exit status will be 0 in case of success, otherwise non-0.

3.5 diag hash

diag hash [-b base] [-o output] [-s] [-w window] file

Calculate Rabin-Karp rolling hash⁴ for bytes in a file with the window size, writing to output (or stdout if omitted). Hashing treats a consecutive bytes as a number in the base, which is 107 by default. The result will be sorted if -s given. The exit status will be 0 in case of success, otherwise non-0.

3.6 diag line

diag line [-m metric] [-t threshold] [-1] [file]

Classify lines in a file (or stdin) by a single-linkage clutering with a metric, and print the resulting clusters. Two lines go into the same cluster if the distance between them is under threshold, which is 10 by default. Singleton clusters will be ommitted for the result unless -1 is given. The exit status will be 0 in case of success, otherwise non-0.

3.7 diag mean

diag mean [-c num_of_columns]

Read a real number per each line of stdin, then calculate the arithmetic mean of all the numbers and print it to stdout. If num_of_columns specified, it reads as many numbers as num_of_columns from each line, forming a vector; then calculate and show all of the mean for each column. It is an error to give an empty input, or to give a less number of columns than num_of_columns at a line. The exit status will be 0 in case of success, otherwise non-0.

3.8 diag median

diag medi

Read a real number per each line of stdin, then calculate the median of the input sample and print it to stdout. It is an error to give an empty input. The exit status will be 0 in case of success, otherwise non-0.

3.9 diag mode

diag mode [-e exponent]

Read a real number per each line of stdin, then calculate the mode of the input sample and print it to stdout. Because the input data can be multimodal, it will result in as many lines of output as the number of the modes. It is an error to give an empty input. The exit status will be 0 in case of success, otherwise non-0.

3.10 diag pool

diag pool [-n num] command [operand ...]

At launch it calls given command with optional operand ... num times. Once one of the callee processes exits normally or abnormally, it will run a new process so as to keep the total number of processes num.

3.11 diag repeat

diag rep [-I interval] [-e code] [-n num] command [operand ...]

Run command with optional operand ... num times iteratively. The default value of num is 5. By default, each successive run will start immediately after the previous run finishes, but there will be interval seconds (at least) between them, if specified. If code is given, it will exit immediately if a run finishes with exit status code. The exit status will be the same as the last run's.

3.12 diag root

diag root [-n num_of_iteration] -g guess0 -g guess1 ... command [operand ...]

Find a root of an equation of form $F(x) = 0$, where $F$ is a function defined by given command with optional operand ... taking input from stdin to produce output to stdout. It prints a root of the equation if found. You have to give 2 or more guess via the -g option. If num_of_iteration is specified, then it will fail immediately after the total number of iteration (i.e. evalutation of the function) exceeds the number. The exit status will be 0 in case of success, otherwise non-0.

3.13 diag times

diag times [-i input] [-n num] command [operand ...]

Apply given command with optional operand ... to an input num times, and then print resulting output to stdout. The input is either input if specified, or stdin otherwise. The exit status will be 0 in case of success, otherwise non-0.

3.14 diag uniq

diag uniq [-c count] [-o output] file

Filter adjacent matching count byte-long blocks from file, writing to output (or stdout if omitted). The exit status will be 0 in case of success, otherwise non-0.

4 Design

Keep the following points in mind when developing diagonal:

4.1 No dependency at run time

Link no library but core system libraries including a C standard library.

4.2 Some items from The UNIX Philosophy⁵:

Make each program do one thing well.
Choose portability over efficiency.
Store numerical data in flat ASCII files.
Use software leverage to your advantage.
Use shell scripts to increse leverage and portability.
Make every program a filter.

Footnotes:

¹ The Open Group Base Specifications Issue 7, 2013 Edition

² http://mingw-w64.sourceforge.net/

³ http://tools.ietf.org/html/rfc3284

⁴ Karp, RM; Rabin, MO (March 1987). "Efficient randomized pattern-matching algorithms". IBM Journal of Research and Development 31 (2): 249–260. doi:10.1147/rd.312.0249

⁵ The UNIX Philosophy

Author: Takeshi Abe <tabe@fixedpoint.jp>

HTML generated by org-mode 6.33x in emacs 23