Initial commit

2024-12-28 15:35:05 +03:00 · 2024-12-28 15:35:05 +03:00 · bd10fad70a
commit bd10fad70a
32 changed files with 2400 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,48 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
--- a/21
+++ b/21
@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c) 2021 Organized Crime and Corruption Reporting Project
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,132 @@
+# cronodump
+
+The cronodump utility can parse most of the databases created by the [CronosPro](https://www.cronos.ru/) database software
+and dump it to several output formats.
+
+The software is popular among Russian public offices, companies and police agencies.
+
+
+# Quick start
+
+In its simplest form, without any dependencies, the croconvert command creates a [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) representation of all the database's tables and a copy of all files contained in the database:
+
+```bash
+bin/croconvert --csv test_data/all_field_types
+```
+
+By default it creates a `cronodump-YYYY-mm-DD-HH-MM-SS-ffffff/` directory containing CSV files for each table found. It will under this directory also create a `Files-FL/` directory containing all the files stored in the Database, regardless if they are (still) referenced in any data table. All files that are actually referenced (and thus are known by their filename) will be stored under the `Files-Referenced` directory. With the `--outputdir` option you can chose your own dump location.
+
+When you get an error message, or just unreadable data, chances are your database is protected. You may need to look into the `--dbcrack` or `--strucrack` options, explained below.
+
+
+# Templates
+
+The croconvert command can use the powerful [jinja templating framework](https://jinja.palletsprojects.com/en/3.0.x/) to render more file formats like PostgreSQL and HTML.
+The default action for `croconvert` is to convert the database using the `html` template.
+Use
+
+```bash
+python3 -m venv ./venc
+. venv/bin/activate
+pip install jinja2
+bin/croconvert test_data/all_field_types > test_data.html
+```
+
+to dump an HTML file with all tables found in the database, files listed and ready for download as inlined [data URI](https://en.wikipedia.org/wiki/Data_URI_scheme) and all table images inlined as well. Note that the resulting HTML file can be huge for large databases, causing a lot of load on browsers when trying to open them.
+
+
+The `-t postgres` command will dump the table schemes and records as valid `CREATE TABLE` and `INSERT INTO` statements to stdout. This dump can then be imported in a PostgreSQL database. Note that the backslash character is not escaped and thus the [`standard_conforming_strings`](https://www.postgresql.org/docs/current/runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS) option should be off.
+
+Pull requests for [more templates supporting other output types](/templates) are welcome.
+
+
+# Inspection
+
+There's a `bin/crodump` tool to further investigate databases. This might be useful for extracting metadata like path names of table image files or input and output forms. Not all metadata has yet been completely reverse engineered, so some experience with understanding binary dumps might be required.
+
+The crodump script has a plethora of options but in the most basic for the `strudump` sub command will provide a rich variety of metadata to look further:
+
+```bash
+bin/crodump strudump -v -a test_data/all_field_types/
+```
+The `-a` option tells strudump to output ascii instead of a hexdump.
+
+For a low level dump of the database contents, use:
+```bash
+bin/crodump crodump -v  test_data/all_field_types/
+```
+The `-v` option tells crodump to include all unused byte ranges, this may be useful when identifying deleted records.
+
+For a bit higher level dump of the database contents, use:
+```bash
+bin/crodump recdump  test_data/all_field_types/
+```
+This will print a hexdump of all records for all tables.
+
+
+## decoding password protected databases
+
+Cronos v4 and higher are able to password protect databases, the protection works
+by modifying the KOD sbox. `cronodump` has two methods of deriving the KOD sbox from
+a database:
+
+Both these methods are statistics based operations, it may not always
+yield the correct KOD sbox.
+
+
+### 1. strudump
+
+When the database has a sufficiently large CroStru.dat file,
+it is easy to derive the nodified KOD-sbox from the CroStru file, the `--strucrack` option
+will do this. 
+
+    crodump --strucrack  recdump <dbpath>
+
+### 2. dbdump
+
+When the Bank and Index files are compressed, we can derive the KOD sbox by inspecting
+the fourth byte of each record, which should decode to a zero.
+
+The `--dbcrack` option will do this.
+
+    crodump --dbcrack  recdump <dbpath>
+
+
+# Installing
+
+`cronodump` requires python 3.7 or later. It has been tested on Linux, MacOS and Windows.
+There is one optional requirement: the `Jinja2` templating engine, but it will install fine without.
+
+There are several ways of installing `cronodump`:
+
+ * You can run `cronodump` directly from the cloned git repository, by using the shell scripts in the `bin` subdirectory.
+ * You can install `cronodump` in your python environment by ruinning: `python setup.py  build install`.
+ * You can install `cronodump` from the public [pypi repository](https://pypi.org/project/cronodump/) with `pip install cronodump`.
+ * You can install `cronodump` with the `Jinja2` templating engine from the public [pypi repository](https://pypi.org/project/cronodump/) with `pip install cronodump[templates]`.
+
+
+# Terminology
+
+We decided to use the more common terminology for database, tables, records, etc.
+Here is a table showing how cronos calls these:
+
+| what | cronos english | cronos russian
+|:------ |:------ |:------ 
+| Database  |  Bank   | Банк 
+| Table     |  Base   | Базы
+| Record    |  Record | Записи
+| Field     |  Field  | поля
+| recid     |  System Number | Системный номер
+
+
+# License
+
+cronodump is released under the [MIT license](LICENSE).
+
+
+# References
+
+cronodump builds upon [documentation of the file format found in older versions of Cronos](http://sergsv.narod.ru/cronos.htm) and
+the [subsequent implementation of a parser for the old file format](https://github.com/occrp/cronosparser) but dropped the heuristic
+approach to guess offsets and obfuscation parameters for a more rigid parser. Refer to [the docs](docs/cronos-research.md) for further
+details.
--- a/bin/croconvert
+++ b/bin/croconvert
@ -0,0 +1,5 @@
+#!/bin/sh
+
+BINPATH=`dirname $0`
+export PYTHONPATH="$BINPATH/.."
+python3 -mcrodump.croconvert "$@"
--- a/bin/crodump
+++ b/bin/crodump
@ -0,0 +1,5 @@
+#!/bin/sh
+
+BINPATH=`dirname $0`
+export PYTHONPATH="$BINPATH/.."
+python3 -mcrodump.crodump "$@"
--- a/crodump/Database.py
+++ b/crodump/Database.py
@ -0,0 +1,288 @@
+from __future__ import print_function, division
+import os
+import re
+from sys import stderr
+from binascii import b2a_hex
+from .readers import ByteReader
+from .hexdump import strescape, toout, ashex
+from .Datamodel import TableDefinition, Record
+from .Datafile import Datafile
+import base64
+import struct
+import crodump.koddecoder
+
+import sys
+if sys.version_info[0] == 2:
+    sys.exit("cronodump needs python3")
+
+
+class Database:
+    """represent the entire database, consisting of Stru, Index and Bank files"""
+
+    def __init__(self, dbdir, compact, kod=crodump.koddecoder.new()):
+        """
+        `dbdir` is the directory containing the Cro*.dat and Cro*.tad files.
+        `compact` if set, the .tad file is not cached in memory, making dumps 15 % slower
+        `kod` is optionally a KOD coder object.
+              by default the v3 KOD coding will be used.
+        """
+        self.dbdir = dbdir
+        self.compact = compact
+        self.kod = kod
+
+        # Stru+Index+Bank for the components for most databases
+        self.stru = self.getfile("Stru")
+        self.index = self.getfile("Index")
+        self.bank = self.getfile("Bank")
+
+        # the Sys file resides in the "Program Files\Cronos" directory, and
+        # contains an index of all known databases.
+        self.sys = self.getfile("Sys")
+
+    def getfile(self, name):
+        """
+        Returns a Datafile object for `name`.
+        this function expects a `Cro<name>.dat` and a `Cro<name>.tad` file.
+        When no such files exist, or only one, then None is returned.
+
+        `name` is matched case insensitively
+        """
+        try:
+            datname = self.getname(name, "dat")
+            tadname = self.getname(name, "tad")
+            if datname and tadname:
+                return Datafile(name, open(datname, "rb"), open(tadname, "rb"), self.compact, self.kod)
+        except IOError:
+            return
+
+    def getname(self, name, ext):
+        """
+        Get a case-insensitive filename match for 'name.ext'.
+        Returns None when no matching file was not found.
+        """
+        basename = "Cro%s.%s" % (name, ext)
+        for fn in os.listdir(self.dbdir):
+            if basename.lower() == fn.lower():
+                return os.path.join(self.dbdir, fn)
+
+    def dump(self, args):
+        """
+        Calls the `dump` method on all database components.
+        """
+        if self.stru:
+            self.stru.dump(args)
+        if self.index:
+            self.index.dump(args)
+        if self.bank:
+            self.bank.dump(args)
+        if self.sys:
+            self.sys.dump(args)
+
+    def strudump(self, args):
+        """
+        prints all info found in the CroStru file.
+        """
+        if not self.stru:
+            print("missing CroStru file")
+            return
+        self.dump_db_table_defs(args)
+
+    def decode_db_definition(self, data):
+        """
+        decode the 'bank' / database definition
+        """
+        rd = ByteReader(data)
+
+        d = dict()
+        while not rd.eof():
+            keyname = rd.readname()
+            if keyname in d:
+                print("WARN: duplicate key: %s" % keyname)
+
+            index_or_length = rd.readdword()
+            if index_or_length >> 31:
+                d[keyname] = rd.readbytes(index_or_length & 0x7FFFFFFF)
+            else:
+                refdata = self.stru.readrec(index_or_length)
+                if refdata[:1] != b"\x04":
+                    print("WARN: expected refdata to start with 0x04")
+                d[keyname] = refdata[1:]
+        return d
+
+    def dump_db_definition(self, args, dbdict):
+        """
+        decode the 'bank' / database definition
+        """
+        for k, v in dbdict.items():
+            if re.search(b"[^\x0d\x0a\x09\x20-\x7e\xc0-\xff]", v):
+                print("%-20s - %s" % (k, toout(args, v)))
+            else:
+                print('%-20s - "%s"' % (k, strescape(v)))
+
+    def dump_db_table_defs(self, args):
+        """
+        decode the table defs from recid #1, which always has table-id #3
+        Note that I don't know if it is better to refer to this by recid, or by table-id.
+
+        other table-id's found in CroStru:
+            #4  -> large values referenced from tableid#3
+        """
+        dbinfo = self.stru.readrec(1)
+        if dbinfo[:1] != b"\x03":
+            print("WARN: expected dbinfo to start with 0x03")
+        dbdef = self.decode_db_definition(dbinfo[1:])
+        self.dump_db_definition(args, dbdef)
+
+        for k, v in dbdef.items():
+            if k.startswith("Base") and k[4:].isnumeric():
+                print("== %s ==" % k)
+                tbdef = TableDefinition(v, dbdef.get("BaseImage" + k[4:], b''))
+                tbdef.dump(args)
+            elif k == "NS1":
+                self.dump_ns1(v)
+
+    def dump_ns1(self, data):
+        if len(data)<2:
+            print("NS1 is unexpectedly short")
+            return
+        unk1, sh, = struct.unpack_from("<BB", data, 0)
+
+        # NS1 is encoded with the default KOD table,
+        # so we are not using stru.kod here.
+        ns1kod = crodump.koddecoder.new()
+        decoded_data = ns1kod.decode(sh, data[2:])
+
+        if len(decoded_data) < 12:
+            print("NS1 is unexpectedly short")
+            return
+        serial, unk2, pwlen, = struct.unpack_from("<LLL", decoded_data, 0)
+        password = decoded_data[12:12+pwlen].decode('cp1251')
+
+        print("== NS1: (%02x,%02x) -> %6d, %d, %d:'%s'" % (unk1, sh, serial, unk2, pwlen, password))
+
+    def enumerate_tables(self, files=False):
+        """
+        yields a TableDefinition object for all `BaseNNN` entries found in CroStru
+        """
+        dbinfo = self.stru.readrec(1)
+        if dbinfo[:1] != b"\x03":
+            print("WARN: expected dbinfo to start with 0x03")
+        try:
+            dbdef = self.decode_db_definition(dbinfo[1:])
+        except Exception as e:
+            print("ERROR decoding db definition: %s" % e)
+            print("This could possibly mean that you need to try with the --strucrack option")
+            return
+
+        for k, v in dbdef.items():
+            if k.startswith("Base") and k[4:].isnumeric():
+                if files and k[4:] == "000":
+                    yield TableDefinition(v)
+                if not files and k[4:] != "000":
+                    yield TableDefinition(v, dbdef.get("BaseImage" + k[4:], b''))
+
+    def enumerate_records(self, table):
+        """
+        Yields a Record object for all records in CroBank matching
+        the tableid from `table`
+
+        usage:
+        for tab in db.enumerate_tables():
+            for rec in db.enumerate_records(tab):
+                print(sqlformatter(tab, rec))
+        """
+        for i in range(self.bank.nrofrecords):
+            data = self.bank.readrec(i + 1)
+            if data and data[0] == table.tableid:
+                try:
+                    yield Record(i + 1, table.fields, data[1:])
+                except EOFError:
+                    print("Record %d too short: -- %s" % (i+1, ashex(data)), file=stderr)
+                except Exception as e:
+                    print("Record %d broken: ERROR '%s' -- %s" % (i+1, e, ashex(data)), file=stderr)
+
+    def enumerate_files(self, table):
+        """
+        Yield all file contents found in CroBank for `table`.
+        This is most likely the table with id 0.
+        """
+        for i in range(self.bank.nrofrecords):
+            data = self.bank.readrec(i + 1)
+            if data and data[0] == table.tableid:
+                yield i + 1, data[1:]
+
+    def get_record(self, index, asbase64=False):
+        """
+        Retrieve a single record from CroBank with record number `index`.
+        """
+        data = self.bank.readrec(int(index))
+        if asbase64:
+            return base64.b64encode(data[1:]).decode('utf-8')
+        else:
+            return data[1:]
+
+    def recdump(self, args):
+        """
+        Function for outputing record contents of the various .dat files.
+
+        This function is mostly useful for reverse-engineering the database format.
+        """
+        if args.index:
+            dbfile = self.index
+        elif args.sys:
+            dbfile = self.sys
+        elif args.stru:
+            dbfile = self.stru
+        else:
+            dbfile = self.bank
+
+        if not dbfile:
+            print(".dat not found")
+            return
+        nerr = 0
+        nr_recnone = 0
+        nr_recempty = 0
+        tabidxref = [0] * 256
+        bytexref = [0] * 256
+        for i in range(1, args.maxrecs + 1):
+            try:
+                data = dbfile.readrec(i)
+                if args.find1d:
+                    if data and (data.find(b"\x1d") > 0 or data.find(b"\x1b") > 0):
+                        print("record with '1d': %d -> %s" % (i, b2a_hex(data)))
+                        break
+
+                elif not args.stats:
+                    if data is None:
+                        print("%5d: <deleted>" % i)
+                    else:
+                        print("%5d: %s" % (i, toout(args, data)))
+                else:
+                    if data is None:
+                        nr_recnone += 1
+                    elif not len(data):
+                        nr_recempty += 1
+                    else:
+                        tabidxref[data[0]] += 1
+                        for b in data[1:]:
+                            bytexref[b] += 1
+                nerr = 0
+            except IndexError:
+                break
+            except Exception as e:
+                print("%5d: <%s>" % (i, e))
+                if args.debug:
+                    raise
+                nerr += 1
+                if nerr > 5:
+                    break
+
+        if args.stats:
+            print("-- table-id stats --, %d * none, %d * empty" % (nr_recnone, nr_recempty))
+            for k, v in enumerate(tabidxref):
+                if v:
+                    print("%5d * %02x" % (v, k))
+            print("-- byte stats --")
+            for k, v in enumerate(bytexref):
+                if v:
+                    print("%5d * %02x" % (v, k))
--- a/crodump/Datafile.py
+++ b/crodump/Datafile.py
@ -0,0 +1,344 @@
+import io
+import struct
+import zlib
+from .hexdump import tohex, toout
+import crodump.koddecoder
+
+class Datafile:
+    """Represent a single .dat with it's .tad index file"""
+
+    def __init__(self, name, dat, tad, compact, kod):
+        self.name = name
+        self.dat = dat
+        self.tad = tad
+        self.compact = compact
+
+        self.readdathdr()
+        self.readtad()
+
+        self.dat.seek(0, io.SEEK_END)
+        self.datsize = self.dat.tell()
+
+        self.kod = kod if not kod or self.isencrypted() else crodump.koddecoder.new()
+
+    def isencrypted(self):
+        return self.version in (b'01.04', b'01.05') or self.isv4()
+
+    def isv3(self):
+        #  01.02: 32 bit file offsets
+        #  01.03: 64 bit file offsets
+        #  01.04:  encrypted?, 32bit
+        #  01.05:  encrypted?, 64bit
+        return self.version in (b'01.02', b'01.03', b'01.04', b'01.05')
+
+    def isv4(self):
+        #  01.11  v4 ( 64bit )
+        #  01.14  v4 ( 64bit ), encrypted?
+        #  01.13  ?? I have not seen this version anywhere yet.
+        return self.version in (b'01.11', b'01.13', b'01.14')
+
+    def isv7(self):
+        #  01.19  ?? I have not seen this version anywhere yet.
+        return self.version in (b'01.19',)
+
+    def readdathdr(self):
+        """
+        Read the .dat file header.
+        Note that the 19 byte header if followed by 0xE9 random bytes, generated by
+        'srand(time())' followed by 0xE9 times obfuscate(rand())
+        """
+        self.dat.seek(0)
+        hdrdata = self.dat.read(19)
+
+        (
+            magic,            # +00  8 bytes
+            self.hdrunk,      # +08  uint16
+            self.version,     # +0a  5 bytes
+            self.encoding,    # +0f  uint16
+            self.blocksize,   # +11  uint16
+        ) = struct.unpack("<8sH5sHH", hdrdata)
+
+        if magic != b"CroFile\x00":
+            print("unknown magic: ", magic)
+            raise Exception("not a Crofile")
+        self.use64bit = self.version in (b"01.03", b"01.05", b"01.11")
+
+        # blocksize
+        #   0040 -> Bank
+        #   0400 -> Index or Sys
+        #   0200 -> Stru  or Sys
+
+        # encoding
+        #   bit0 = 'KOD encoded'
+        #   bit1 = compressed
+
+    def readtad(self):
+        """
+        read and decode the .tad file.
+        """
+        self.tad.seek(0)
+        if self.isv3():
+            hdrdata = self.tad.read(2 * 4)
+            self.nrdeleted, self.firstdeleted = struct.unpack("<2L", hdrdata)
+        elif self.isv4():
+            hdrdata = self.tad.read(4 * 4)
+            unk1, self.nrdeleted, self.firstdeleted, unk2 = struct.unpack("<4L", hdrdata)
+        else:
+            raise Exception("unsupported .tad version")
+
+        self.tadhdrlen = self.tad.tell()
+        self.tadentrysize = 16 if self.use64bit else 12
+        if self.compact:
+            self.tad.seek(0, io.SEEK_END)
+        else:
+            self.idxdata = self.tad.read()
+        self.tadsize = self.tad.tell() - self.tadhdrlen
+        self.nrofrecords = self.tadsize // self.tadentrysize
+        if self.tadsize % self.tadentrysize:
+            print("WARN: leftover data in .tad")
+
+    def tadidx(self, idx):
+        """
+        If we're not supposed to be more compact but slower, lookup from a cached .tad
+        """
+        if self.compact:
+            return self.tadidx_seek(idx)
+
+        if self.use64bit:
+            # 01.03 and 01.11 have 64 bit file offsets
+            return struct.unpack_from("<QLL", self.idxdata, idx * self.tadentrysize)
+        else:
+            # 01.02  and 01.04  have 32 bit offsets.
+           return struct.unpack_from("<LLL", self.idxdata, idx * self.tadentrysize)
+
+
+    def tadidx_seek(self, idx):
+        """
+            Memory saving version without caching the .tad
+        """
+        self.tad.seek(self.tadhdrlen + idx * self.tadentrysize)
+        idxdata = self.tad.read(self.tadentrysize)
+
+        if self.use64bit:
+            # 01.03 and 01.11 have 64 bit file offsets
+            return struct.unpack("<QLL", idxdata)
+        else:
+            # 01.02  and 01.04  have 32 bit offsets.
+           return struct.unpack("<LLL", idxdata)
+
+    def readdata(self, ofs, size):
+        """
+        Read raw data from the .dat file
+        """
+        self.dat.seek(ofs)
+        return self.dat.read(size)
+
+    def readrec(self, idx):
+        """
+        Extract and decode a single record.
+        """
+        if idx == 0:
+            raise Exception("recnum must be a positive number")
+        ofs, ln, chk = self.tadidx(idx - 1)
+        if ln == 0xFFFFFFFF:
+            # deleted record
+            return
+
+        if self.isv3():
+            flags = ln >> 24
+            ln &= 0xFFFFFFF
+        elif self.isv4():
+            flags = ofs >> 56
+            ofs &= (1<<56)-1
+
+        dat = self.readdata(ofs, ln)
+
+        if not dat:
+            # empty record
+            encdat = dat
+        elif not flags:
+            if self.use64bit:
+                extofs, extlen = struct.unpack("<QL", dat[:12])
+                o = 12
+            else:
+                extofs, extlen = struct.unpack("<LL", dat[:8])
+                o = 8
+
+            encdat = dat[o:]
+            while len(encdat) < extlen:
+                dat = self.readdata(extofs, self.blocksize)
+                if self.use64bit:
+                    (extofs,) = struct.unpack("<Q", dat[:8])
+                    o = 8
+                else:
+                    (extofs,) = struct.unpack("<L", dat[:4])
+                    o = 4
+                encdat += dat[o:]
+
+            encdat = encdat[:extlen]
+        else:
+            encdat = dat
+
+        if self.encoding & 1:
+            if self.kod:
+                encdat = self.kod.decode(idx, encdat)
+
+        if self.iscompressed(encdat):
+            encdat = self.decompress(encdat)
+
+        return encdat
+
+    def enumrecords(self):
+        for i in range(self.nrofrecords):
+            yield self.readrec(i+1)
+
+    def enumunreferenced(self, ranges, filesize):
+        """
+        From a list of used byte ranges and the filesize, enumerate the list of unused byte ranges
+        """
+        o = 0
+        for start, end, desc in sorted(ranges):
+            if start > o:
+                yield o, start - o
+            o = end
+        if o < filesize:
+            yield o, filesize - o
+
+    def dump(self, args):
+        """
+        Dump decodes all data referenced from the .tad file.
+        And optionally print out all unreferenced byte ranges in the .dat file.
+
+        This function is mostly useful for reverse-engineering the database format.
+
+        the `args` object controls how data is decoded.
+        """
+        print("hdr: %-6s dat: %04x %s enc:%04x bs:%04x, tad: %08x %08x" % (
+                self.name, self.hdrunk, self.version,
+                self.encoding, self.blocksize,
+                self.nrdeleted, self.firstdeleted))
+
+        ranges = []  # keep track of used bytes in the .dat file.
+
+        for i in range(self.nrofrecords):
+            (ofs, ln, chk) = self.tadidx(i)
+            idx = i + 1
+            if args.maxrecs and i==args.maxrecs:
+                break
+            if ln == 0xFFFFFFFF:
+                print("%5d: %08x %08x %08x" % (idx, ofs, ln, chk))
+                continue
+
+            if self.isv3():
+                flags = ln >> 24
+                ln &= 0xFFFFFFF
+            elif self.isv4():
+                flags = ofs >> 56
+                # 04 --> data, v3compdata
+                # 02,03 --> deleted
+                # 00 --> extrec
+                ofs &= (1<<56)-1
+
+            dat = self.readdata(ofs, ln)
+            ranges.append((ofs, ofs + ln, "item #%d" % i))
+            decflags = [" ", " "]
+            infostr = ""
+            tail = b""
+
+            if not dat:
+                # empty record
+                encdat = dat
+            elif not flags:
+                if self.use64bit:
+                    extofs, extlen = struct.unpack("<QL", dat[:12])
+                    o = 12
+                else:
+                    extofs, extlen = struct.unpack("<LL", dat[:8])
+                    o = 8
+                infostr = "%08x;%08x" % (extofs, extlen)
+                encdat = dat[o:]
+                while len(encdat) < extlen:
+                    dat = self.readdata(extofs, self.blocksize)
+                    ranges.append((extofs, extofs + self.blocksize, "item #%d ext" % i))
+                    if self.use64bit:
+                        (extofs,) = struct.unpack("<Q", dat[:8])
+                        o = 8
+                    else:
+                        (extofs,) = struct.unpack("<L", dat[:4])
+                        o = 4
+                    infostr += ";%08x" % (extofs)
+                    encdat += dat[o:]
+                tail = encdat[extlen:]
+                encdat = encdat[:extlen]
+                decflags[0] = "+"
+            else:
+                encdat = dat
+                decflags[0] = "*"
+
+            if self.encoding & 1:
+                if self.kod:
+                    encdat = self.kod.decode(idx, encdat)
+            else:
+                decflags[0] = " "
+
+            if args.decompress:
+                if self.iscompressed(encdat):
+                    encdat = self.decompress(encdat)
+                    decflags[1] = "@"
+
+            # TODO: separate handling for v4
+            print("%5d: %08x-%08x: (%02x:%08x) %s %s%s %s" % (
+                    i+1, ofs, ofs + ln, flags, chk,
+                    infostr, "".join(decflags), toout(args, encdat), tohex(tail)))
+
+        if args.verbose:
+            # output parts not referenced in the .tad file.
+            for o, l in self.enumunreferenced(ranges, self.datsize):
+                dat = self.readdata(o, l)
+                print("%08x-%08x: %s" % (o, o + l, toout(args, dat)))
+
+    def iscompressed(self, data):
+        """
+        Check if this record looks like a compressed record.
+        """
+        if len(data) < 11:
+            return
+        if data[-3:] != b"\x00\x00\x02":
+            return
+        o = 0
+        while o < len(data) - 3:
+            size, flag = struct.unpack_from(">HH", data, o)
+            if flag != 0x800 and flag != 0x008:
+                return
+            o += size + 2
+        return True
+
+    def decompress(self, data):
+        """
+        Decompress a record.
+
+        Compressed records can have several chunks of compressed data.
+        Note that the compression header uses a mix of big-endian and little numbers.
+
+        each chunk has the following format:
+            size  - big endian uint16, size of flag + crc + compdata
+            flag  - big endian uint16 - always 0x800
+            crc   - little endian uint32, crc32 of the decompressed data
+        the final chunk has only 3 bytes: a zero size followed by a 2.
+
+        the crc algorithm is the one labeled 'crc-32' on this page:
+            http://crcmod.sourceforge.net/crcmod.predefined.html
+        """
+        result = b""
+        o = 0
+        while o < len(data) - 3:
+            # note the mix of bigendian and little endian numbers here.
+            size, flag = struct.unpack_from(">HH", data, o)
+            storedcrc, = struct.unpack_from("<L", data, o+4)
+
+            C = zlib.decompressobj(-15)
+            result += C.decompress(data[o+8:o+8+size-6])
+            # note that we are not verifying the crc!
+
+            o += size + 2
+        return result
--- a/crodump/Datamodel.py
+++ b/crodump/Datamodel.py
@ -0,0 +1,255 @@
+# -*- coding: utf-8 -*-
+from .hexdump import tohex, ashex
+from .readers import ByteReader
+
+
+class FieldDefinition:
+    """
+    Contains the properties for a single field in a record.
+    """
+    def __init__(self, data):
+        self.decode(data)
+
+    def decode(self, data):
+        self.defdata = data
+
+        rd = ByteReader(data)
+        self.typ = rd.readword()
+        self.idx1 = rd.readdword()
+        self.name = rd.readname()
+        self.flags = rd.readdword()
+        self.minval = rd.readbyte()  # Always 1
+        if self.typ:
+            self.idx2 = rd.readdword()
+            self.maxval = rd.readdword()  # max value or length
+            self.unk4 = rd.readdword()  # Always 0x00000009 or 0x0001000d
+        else:
+            self.idx2 = 0
+            self.maxval = self.unk4 = None
+        self.remaining = rd.readbytes()
+
+    def __str__(self):
+        if self.typ:
+            return "Type: %2d (%2d/%2d) %04x,(%d-%4d),%04x - %-40s -- %s" % (
+                    self.typ, self.idx1, self.idx2,
+                    self.flags, self.minval, self.maxval, self.unk4,
+                    "'%s'" % self.name, tohex(self.remaining))
+        else:
+            return "Type: %2d %2d    %d,%d       - '%s'" % (
+                    self.typ, self.idx1, self.flags, self.minval, self.name)
+
+    def sqltype(self):
+        return { 0: "INTEGER PRIMARY KEY",
+                 1: "INTEGER",
+                 2: "VARCHAR(" + str(self.maxval) + ")",
+                 3: "TEXT",          # dictionaray
+                 4: "DATE",
+                 5: "TIMESTAMP",
+                 6: "TEXT",          # file reference
+        }.get(self.typ, "TEXT")
+
+
+class TableImage:
+    def __init__(self, data):
+        self.decode(data)
+
+    def decode(self, data):
+        if not len(data):
+            self.filename = "none"
+            self.data = b''
+            return
+
+        rd = ByteReader(data)
+
+        _ = rd.readbyte()
+        namelen = rd.readdword()
+        self.filename = rd.readbytes(namelen).decode("cp1251", 'ignore')
+
+        imagelen = rd.readdword()
+        self.data = rd.readbytes(imagelen)
+
+
+class TableDefinition:
+    def __init__(self, data, image=''):
+        self.decode(data, image)
+
+    def decode(self, data, image):
+        """
+        decode the 'base' / table definition
+        """
+        rd = ByteReader(data)
+
+        self.unk1 = rd.readword()
+        self.version = rd.readbyte()
+        if self.version > 1:
+            _ = rd.readbyte()  # always 0 anyway
+
+        # if this is not 5 (but 9), there's another 4 bytes inserted, this could be a length-byte.
+        self.unk2 = rd.readbyte()
+
+        self.unk3 = rd.readbyte()
+        if self.unk2 > 5:  # seen only 5 and 9 for now with 9 implying an extra dword
+            _ = rd.readdword()
+        self.unk4 = rd.readdword()
+
+        self.tableid = rd.readdword()
+
+        self.tablename = rd.readname()
+        self.abbrev = rd.readname()
+        self.unk7 = rd.readdword()
+        nrfields = rd.readdword()
+
+        self.headerdata = data[: rd.o]
+
+        # There's (at least) two blocks describing fields, ended when encountering ffffffff
+        self.fields = []
+        for _ in range(nrfields):
+            deflen = rd.readword()
+            fielddef = rd.readbytes(deflen)
+            self.fields.append(FieldDefinition(fielddef))
+
+        # Between the first and the second block, there's some byte strings inbetween, count
+        # given in first dword
+        self.extraunkdatastrings = rd.readdword()
+
+        for _ in range(self.extraunkdatastrings):
+            datalen = rd.readword()
+            skip = rd.readbytes(datalen)
+
+        try:
+            # Then there's another unknow dword and then (probably section indicator) 02 byte
+            self.unk8_ = rd.readdword()
+            if rd.readbyte() != 2:
+                print("Warning: FieldDefinition Section 2 not marked with a 2")
+            self.unk9 = rd.readdword()
+
+            # Then there's the amount of extra fields in the second section
+            nrextrafields = rd.readdword()
+
+            for _ in range(nrextrafields):
+                deflen = rd.readword()
+                fielddef = rd.readbytes(deflen)
+                self.fields.append(FieldDefinition(fielddef))
+        except Exception as e:
+            print("Warning: Error '%s' parsing FieldDefinitions" % e)
+
+        try:
+            self.terminator = rd.readdword()
+        except EOFError:
+            print("Warning: FieldDefinition section not terminated")
+        except Exception as e:
+            print("Warning: Error '%s' parsing Tabledefinition" % e)
+
+        self.fields.sort(key=lambda field: field.idx2)
+
+        self.remainingdata = rd.readbytes()
+
+        self.tableimage = TableImage(image)
+
+    def __str__(self):
+        return "%d,%d<%d,%d,%d>%d  %d,%d '%s'  '%s'  [TableImage(%d bytes): %s]" % (
+                self.unk1, self.version, self.unk2, self.unk3, self.unk4, self.tableid,
+                self.unk7, len(self.fields),
+                self.tablename, self.abbrev, len(self.tableimage.data), self.tableimage.filename)
+
+    def dump(self, args):
+        if args.verbose:
+            print("table: %s" % tohex(self.headerdata))
+
+        print(str(self))
+
+        for i, field in enumerate(self.fields):
+            if args.verbose:
+                print("field#%2d: %04x - %s" % (
+                    i, len(field.defdata), tohex(field.defdata)))
+            print(str(field))
+        if args.verbose:
+            print("remaining: %s" % tohex(self.remainingdata))
+
+
+class Field:
+    """
+    Contains a single fully decoded value.
+    """
+    def __init__(self, fielddef, data):
+        self.decode(fielddef, data)
+
+    def decode(self, fielddef, data):
+        self.typ = fielddef.typ
+        self.data = data
+
+        if not data:
+            self.content = ""
+            return
+        elif self.typ == 0:
+            # typ 0 is the recno, or as cronos calls this: Системный номер, systemnumber.
+            # just convert this to string for presentation
+            self.content = str(data)
+
+        elif self.typ == 4:
+            # typ 4 is DATE, formatted like: <year-1900:signedNumber><month:2digits><day:2digits>
+            try:
+                data = data.rstrip(b"\x00")
+                y, m, d = 1900+int(data[:-4]), int(data[-4:-2]), int(data[-2:])
+                self.content = "%04d-%02d-%02d" % (y, m, d)
+            except ValueError:
+                self.content = str(data)
+
+        elif self.typ == 5:
+            # typ 5 is TIME, formatted like: <hour:2digits><minute:2digits>
+            try:
+                data = data.rstrip(b"\x00")
+                h, m = int(data[-4:-2]), int(data[-2:])
+                self.content = "%02d:%02d" % (h, m)
+            except ValueError:
+                self.content = str(data)
+
+        elif self.typ == 6:
+            # decode internal file reference
+            rd = ByteReader(data)
+            self.flag = rd.readdword()
+            self.remlen = rd.readdword()
+            self.filename = rd.readtoseperator(b"\x1e").decode("cp1251", 'ignore')
+            self.extname = rd.readtoseperator(b"\x1e").decode("cp1251", 'ignore')
+            self.filedatarecord = rd.readtoseperator(b"\x1e").decode("cp1251", 'ignore')
+            self.content = " ".join([self.filename, self.extname, self.filedatarecord])
+
+        elif self.typ == 7 or self.typ == 8 or self.typ == 9:
+            # just hexdump foreign keys
+            self.content = ashex(data)
+
+        else:
+            # currently assuming everything else to be strings, which is wrong
+            self.content = data.rstrip(b"\x00").decode("cp1251", 'ignore')
+
+
+class Record:
+    """
+    Contains a single fully decoded record.
+    """
+    def __init__(self, recno, tabledef, data):
+        self.decode(recno, tabledef, data)
+
+    def decode(self, recno, tabledef, data):
+        """
+        decode the fields in a record
+        """
+        self.data = data
+        self.recno = recno
+        self.table = tabledef
+
+        # start with the record number, or as Cronos calls this:
+        # the system number, in russian: Системный номер.
+        self.fields = [ Field(tabledef[0], str(recno)) ]
+
+        rd = ByteReader(data)
+        for fielddef in tabledef[1:]:
+            if not rd.eof() and rd.testbyte(0x1b):
+                # read complex record indicated by b"\x1b"
+                rd.readbyte()
+                size = rd.readdword()
+                fielddata = rd.readbytes(size)
+            else:
+                fielddata = rd.readtoseperator(b"\x1e")
+
+            self.fields.append(Field(fielddef, fielddata))
--- a/crodump/init.py
+++ b/crodump/init.py
--- a/crodump/croconvert.py
+++ b/crodump/croconvert.py
@ -0,0 +1,140 @@
+"""
+Commandline tool which convert a cronos database to .csv, .sql or .html.
+
+python3 croconvert.py -t html chechnya_proverki_ul_2012/
+"""
+from .Database import Database
+from .crodump import strucrack, dbcrack
+from .hexdump import unhex
+from sys import exit, stdout
+from os.path import dirname, abspath, join
+from os import mkdir, chdir
+from datetime import datetime
+import base64
+import csv
+
+
+def template_convert(kod, args):
+    """looks up template to convert to, parses the database and passes it to jinja2"""
+    try:
+        from jinja2 import Environment, FileSystemLoader
+    except ImportError:
+        exit(
+            "Fatal: Jinja templating engine not found. Install using pip install jinja2"
+        )
+
+    db = Database(args.dbdir, args.compact, kod)
+
+    template_dir = join(dirname(dirname(abspath(__file__))), "templates")
+    j2_env = Environment(loader=FileSystemLoader(template_dir))
+    j2_templ = j2_env.get_template(args.template + ".j2")
+    j2_templ.stream(db=db, base64=base64).dump(stdout)
+
+
+def safepathname(name):
+    return name.replace(':', '_').replace('/', '_').replace('\\', '_')
+
+
+def csv_output(kod, args):
+    """creates a directory with the current timestamp and in it a set of CSV or TSV
+       files with all the tables found and an extra directory with all the files"""
+    db = Database(args.dbdir, args.compact, kod)
+
+    mkdir(args.outputdir)
+    chdir(args.outputdir)
+
+    filereferences = []
+
+    # first dump all non-file tables
+    for table in db.enumerate_tables(files=False):
+        tablesafename = safepathname(table.tablename) + ".csv"
+
+        with open(tablesafename, 'w', encoding='utf-8') as csvfile:
+            writer = csv.writer(csvfile, delimiter=args.delimiter, escapechar='\\')
+            writer.writerow([field.name for field in table.fields])
+
+            # Record should be iterable over its fields, so we could use writerows
+            for record in db.enumerate_records(table):
+                writer.writerow([field.content for field in record.fields])
+
+                filereferences.extend([field for field in record.fields if field.typ == 6])
+
+    # Write all files from the file table. This is useful for unreferenced files
+    for table in db.enumerate_tables(files=True):
+        filedir = "Files-" + table.abbrev
+        mkdir(filedir)
+
+        for system_number, content in db.enumerate_files(table):
+            with open(join(filedir, str(system_number)), "wb") as binfile:
+                binfile.write(content)
+
+    if len(filereferences):
+        filedir = "Files-Referenced"
+        mkdir(filedir)
+
+    # Write all referenced files with their filename and extension intact
+    for reffile in filereferences:
+        if reffile.content:             # only print when file is not NULL
+            filesafename = safepathname(reffile.filename) + "." + safepathname(reffile.extname)
+            content = db.get_record(reffile.filedatarecord)
+            with open(join("Files-Referenced", filesafename), "wb") as binfile:
+                binfile.write(content)
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description="CRONOS database converter")
+    parser.add_argument("--template", "-t", type=str, default="html",
+                        help="output template to use for conversion")
+    parser.add_argument("--csv", "-c", action='store_true', help="create output in .csv format")
+    parser.add_argument("--delimiter", "-d", default=",", help="delimiter used in csv output")
+    parser.add_argument("--outputdir", "-o", type=str, help="directory to create the dump in")
+    parser.add_argument("--kod", type=str, help="specify custom KOD table")
+    parser.add_argument("--compact", action="store_true", help="save memory by not caching the index, note: increases convert time by factor 1.15")
+    parser.add_argument("--strucrack", action="store_true", help="infer the KOD sbox from CroStru.dat")
+    parser.add_argument("--dbcrack", action="store_true", help="infer the KOD sbox from CroIndex.dat+CroBank.dat")
+    parser.add_argument("--nokod", "-n", action="store_true", help="don't KOD decode")
+    parser.add_argument("dbdir", type=str)
+    args = parser.parse_args()
+
+    import crodump.koddecoder
+    if args.kod:
+        if len(args.kod)!=512:
+            raise Exception("--kod should have a 512 hex digit argument")
+        kod = crodump.koddecoder.new(list(unhex(args.kod)))
+    elif args.nokod:
+        kod = None
+    elif args.strucrack:
+        class Cls: pass
+        cargs = Cls()
+        cargs.dbdir = args.dbdir
+        cargs.sys = False
+        cargs.silent = True
+        cracked = strucrack(None, cargs)
+        if not cracked:
+            return
+        kod = crodump.koddecoder.new(cracked)
+    elif args.dbcrack:
+        class Cls: pass
+        cargs = Cls()
+        cargs.dbdir = args.dbdir
+        cargs.sys = False
+        cargs.silent = True
+        cracked = dbcrack(None, cargs)
+        if not cracked:
+            return
+        kod = crodump.koddecoder.new(cracked)
+    else:
+        kod = crodump.koddecoder.new()
+
+    if args.csv:
+        if not args.outputdir:
+            args.outputdir = "cronodump"+datetime.now().strftime("-%Y-%m-%d-%H-%M-%S-%f")
+        csv_output(kod, args)
+    else:
+        template_convert(kod, args)
+
+
+if __name__ == "__main__":
+    main()
--- a/crodump/crodump.py
+++ b/crodump/crodump.py
@ -0,0 +1,296 @@
+from .kodump import kod_hexdump
+from .hexdump import unhex, tohex
+from .readers import ByteReader
+from .Database import Database
+from .Datamodel import TableDefinition
+
+
+def destruct_sys3_def(rd):
+    # todo
+    pass
+
+
+def destruct_sys4_def(rd):
+    """
+    decode type 4 of the records found in CroSys.
+
+    This function is only useful for reverse-engineering the CroSys format.
+    """
+    n = rd.readdword()
+    for _ in range(n):
+        marker = rd.readdword()
+        description = rd.readlongstring()
+        path = rd.readlongstring()
+        marker2 = rd.readdword()
+
+        print("%08x;%08x: %-50s : %s" % (marker, marker2, path, description))
+
+
+def destruct_sys_definition(args, data):
+    """
+    Decode the 'sys' / dbindex definition
+
+    This function is only useful for reverse-engineering the CroSys format.
+    """
+    rd = ByteReader(data)
+
+    systype = rd.readbyte()
+    if systype == 3:
+        return destruct_sys3_def(rd)
+    elif systype == 4:
+        return destruct_sys4_def(rd)
+    else:
+        raise Exception("unsupported sys record")
+
+
+def cro_dump(kod, args):
+    """handle 'crodump' subcommand"""
+    if args.maxrecs:
+        args.maxrecs = int(args.maxrecs, 0)
+    else:
+        # an arbitrarily large number.
+        args.maxrecs = 0xFFFFFFFF
+
+    db = Database(args.dbdir, args.compact, kod)
+    db.dump(args)
+
+
+def stru_dump(kod, args):
+    """handle 'strudump' subcommand"""
+    db = Database(args.dbdir, args.compact, kod)
+    db.strudump(args)
+
+
+def sys_dump(kod, args):
+    """hexdump all CroSys records"""
+    # an arbitrarily large number.
+    args.maxrecs = 0xFFFFFFFF
+
+    db = Database(args.dbdir, args.compact, kod)
+    if db.sys:
+        db.sys.dump(args)
+
+
+def rec_dump(kod, args):
+    """hexdump all records of the specified CroXXX.dat file."""
+    if args.maxrecs:
+        args.maxrecs = int(args.maxrecs, 0)
+    else:
+        # an arbitrarily large number.
+        args.maxrecs = 0xFFFFFFFF
+
+    db = Database(args.dbdir, args.compact, kod)
+    db.recdump(args)
+
+
+def destruct(kod, args):
+    """
+    decode the index#1 structure information record
+    Takes hex input from stdin.
+    """
+    import sys
+
+    data = sys.stdin.buffer.read()
+    data = unhex(data)
+
+    if args.type == 1:
+        # create a dummy db object
+        db = Database(".", args.compact)
+        db.dump_db_definition(args, data)
+    elif args.type == 2:
+        tbdef = TableDefinition(data)
+        tbdef.dump(args)
+    elif args.type == 3:
+        destruct_sys_definition(args, data)
+
+
+def strucrack(kod, args):
+    """
+    This function derives the KOD key from the assumption that most bytes in
+    the CroStru records will be zero, given a sufficient number of CroStru
+    items, statistically the most common bytes will encode to '0x00'
+    """
+
+    # start without 'KOD' table, so we will get the encrypted records
+    db = Database(args.dbdir, args.compact, None)
+    if args.sys:
+        table = db.sys
+        if not db.sys:
+            print("no CroSys.dat file found in %s" % args.dbdir)
+            return
+    else:
+        table = db.stru
+        if not db.stru:
+            print("no CroStru.dat file found in %s" % args.dbdir)
+            return
+
+    xref = [ [0]*256 for _ in range(256) ]
+    for i, data in enumerate(table.enumrecords()):
+        if not data: continue
+        for ofs, byte in enumerate(data):
+            xref[(ofs+i+1)%256][byte] += 1
+
+    KOD = [0] * 256
+    for i, xx in enumerate(xref):
+        k, v = max(enumerate(xx), key=lambda kv: kv[1])
+        KOD[k] = i
+
+    if not args.silent:
+        print(tohex(bytes(KOD)))
+
+    return KOD
+
+def dbcrack(kod, args):
+    """
+    This function derives the KOD key from the assumption that most records in CroIndex
+    and CroBank will be compressed, and start with:
+      uint16 size
+      byte  0x08
+      byte  0x00
+
+    So because the fourth byte in each record will be 0x00 when kod-decoded, I can
+    use this as the inverse of the KOD table, adjusting for record-index.
+
+    """
+    # start without 'KOD' table, so we will get the encrypted records
+    db = Database(args.dbdir, args.compact, None)
+    xref = [ [0]*256 for _ in range(256) ]
+
+    for dbfile in db.bank, db.index:
+        if not dbfile:
+            print("no data file found in %s" % args.dbdir)
+            return
+        for i in range(1, min(10000, dbfile.nrofrecords)):
+            rec = dbfile.readrec(i)
+            if rec and len(rec)>11:
+                xref[(i+3)%256][rec[3]] += 1
+
+    KOD = [0] * 256
+    for i, xx in enumerate(xref):
+        k, v = max(enumerate(xx), key=lambda kv: kv[1])
+        KOD[k] = i
+
+    if not args.silent:
+        print(tohex(bytes(KOD)))
+
+    return KOD
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description="CRO hexdumper")
+    subparsers = parser.add_subparsers(title='commands',
+                        help='Use the --help option for the individual sub commands for more details')
+    parser.set_defaults(handler=lambda *args: parser.print_help())
+    parser.add_argument("--debug", action="store_true", help="break on exceptions")
+    parser.add_argument("--kod", type=str, help="specify custom KOD table")
+    parser.add_argument("--strucrack", action="store_true", help="infer the KOD sbox from CroStru.dat")
+    parser.add_argument("--dbcrack", action="store_true", help="infer the KOD sbox from CroBank.dat + CroIndex.dat")
+    parser.add_argument("--nokod", "-n", action="store_true", help="don't KOD decode")
+    parser.add_argument("--compact", action="store_true", help="save memory by not caching the index, note: increases convert time by factor 1.15")
+
+    p = subparsers.add_parser("kodump", help="KOD/hex dumper")
+    p.add_argument("--offset", "-o", type=str, default="0")
+    p.add_argument("--length", "-l", type=str)
+    p.add_argument("--width", "-w", type=str)
+    p.add_argument("--endofs", "-e", type=str)
+    p.add_argument("--nokod", "-n", action="store_true", help="don't KOD decode")
+    p.add_argument("--unhex", "-x", action="store_true", help="assume the input contains hex data")
+    p.add_argument("--shift", "-s", type=str, help="KOD decode with the specified shift")
+    p.add_argument("--increment", "-i", action="store_true",
+                   help="assume data is already KOD decoded, but with wrong shift -> dump alternatives.")
+    p.add_argument("--ascdump", "-a", action="store_true", help="CP1251 asc dump of the data")
+    p.add_argument("--invkod", "-I", action="store_true", help="KOD encode")
+    p.add_argument("filename", type=str, nargs="?", help="dump either stdin, or the specified file")
+    p.set_defaults(handler=kod_hexdump)
+
+    p = subparsers.add_parser("crodump", help="CROdumper")
+    p.add_argument("--verbose", "-v", action="store_true")
+    p.add_argument("--ascdump", "-a", action="store_true")
+    p.add_argument("--maxrecs", "-m", type=str, help="max nr or recots to output")
+    p.add_argument("--nodecompress", action="store_false", dest="decompress", default="true")
+    p.add_argument("dbdir", type=str)
+    p.set_defaults(handler=cro_dump)
+
+    p = subparsers.add_parser("sysdump", help="SYSdumper")
+    p.add_argument("--verbose", "-v", action="store_true")
+    p.add_argument("--ascdump", "-a", action="store_true")
+    p.add_argument("--nodecompress", action="store_false", dest="decompress", default="true")
+    p.add_argument("dbdir", type=str)
+    p.set_defaults(handler=sys_dump)
+
+    p = subparsers.add_parser("recdump", help="record dumper")
+    p.add_argument("--verbose", "-v", action="store_true")
+    p.add_argument("--ascdump", "-a", action="store_true")
+    p.add_argument("--maxrecs", "-m", type=str, help="max nr or recots to output")
+    p.add_argument("--find1d", action="store_true", help="Find records with 0x1d in it")
+    p.add_argument("--stats", action="store_true", help="calc table stats from the first byte of each record",)
+    p.add_argument("--index", action="store_true", help="dump CroIndex")
+    p.add_argument("--stru", action="store_true", help="dump CroIndex")
+    p.add_argument("--bank", action="store_true", help="dump CroBank")
+    p.add_argument("--sys", action="store_true", help="dump CroSys")
+    p.add_argument("dbdir", type=str)
+    p.set_defaults(handler=rec_dump)
+
+    p = subparsers.add_parser("strudump", help="STRUdumper")
+    p.add_argument("--verbose", "-v", action="store_true")
+    p.add_argument("--ascdump", "-a", action="store_true")
+    p.add_argument("dbdir", type=str)
+    p.set_defaults(handler=stru_dump)
+
+    p = subparsers.add_parser("destruct", help="Stru dumper")
+    p.add_argument("--verbose", "-v", action="store_true")
+    p.add_argument("--ascdump", "-a", action="store_true")
+    p.add_argument("--type", "-t", type=int, help="what type of record to destruct")
+    p.set_defaults(handler=destruct)
+
+    p = subparsers.add_parser("strucrack", help="Crack v4 KOD encrypion, bypassing the need for the database password.")
+    p.add_argument("--sys", action="store_true", help="Use CroSys for cracking")
+    p.add_argument("--silent", action="store_true", help="no output")
+    p.add_argument("dbdir", type=str)
+    p.set_defaults(handler=strucrack)
+
+    p = subparsers.add_parser("dbcrack", help="Crack v4 KOD encrypion, bypassing the need for the database password.")
+    p.add_argument("--silent", action="store_true", help="no output")
+    p.add_argument("dbdir", type=str)
+    p.set_defaults(handler=dbcrack)
+
+    args = parser.parse_args()
+
+    import crodump.koddecoder
+    if args.kod:
+        if len(args.kod)!=512:
+            raise Exception("--kod should have a 512 hex digit argument")
+        kod = crodump.koddecoder.new(list(unhex(args.kod)))
+    elif args.nokod:
+        kod = None
+    elif args.strucrack:
+        class Cls: pass
+        cargs = Cls()
+        cargs.dbdir = args.dbdir
+        cargs.sys = False
+        cargs.silent = True
+        cracked = strucrack(None, cargs)
+        if not cracked:
+            return
+        kod = crodump.koddecoder.new(cracked)
+    elif args.dbcrack:
+        class Cls: pass
+        cargs = Cls()
+        cargs.dbdir = args.dbdir
+        cargs.sys = False
+        cargs.silent = True
+        cracked = dbcrack(None, cargs)
+        if not cracked:
+            return
+        kod = crodump.koddecoder.new(cracked)
+    else:
+        kod = crodump.koddecoder.new()
+
+    if args.handler:
+        args.handler(kod, args)
+
+
+if __name__ == "__main__":
+    main()
--- a/crodump/dumpdbfields.py
+++ b/crodump/dumpdbfields.py
@ -0,0 +1,84 @@
+"""
+`dumpdbfields` demonstrates how to enumerate tables and records.
+"""
+import os
+import os.path
+from .Database import Database
+from .crodump import strucrack, dbcrack
+from .hexdump import unhex
+
+
+def processargs(args):
+    for dbpath in args.dbdirs:
+        if args.recurse:
+            for path, _, files in os.walk(dbpath):
+                # check if there is a crostru file in this directory.
+                if any(_ for _ in files if _.lower() == "crostru.dat"):
+                    yield path
+        else:
+            yield dbpath
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description="db field dumper")
+    parser.add_argument("--kod", type=str, help="specify custom KOD table")
+    parser.add_argument("--strucrack", action="store_true", help="infer the KOD sbox from CroStru.dat")
+    parser.add_argument("--dbcrack", action="store_true", help="infer the KOD sbox from CroIndex.dat+CroBank.dat")
+    parser.add_argument("--nokod", "-n", action="store_true", help="don't KOD decode")
+    parser.add_argument("--maxrecs", "-m", type=int, default=100)
+    parser.add_argument("--recurse", "-r", action="store_true")
+    parser.add_argument("--verbose", "-v", action="store_true")
+    parser.add_argument("dbdirs", type=str, nargs='*')
+    args = parser.parse_args()
+
+    for path in processargs(args):
+        try:
+            import crodump.koddecoder
+            if args.kod:
+                if len(args.kod)!=512:
+                    raise Exception("--kod should have a 512 hex digit argument")
+                kod = crodump.koddecoder.new(list(unhex(args.kod)))
+            elif args.nokod:
+                kod = None
+            elif args.strucrack:
+                class Cls: pass
+                cargs = Cls()
+                cargs.dbdir = path
+                cargs.sys = False
+                cargs.silent = True
+                cracked = strucrack(None, cargs)
+                if not cracked:
+                    return
+                kod = crodump.koddecoder.new(cracked)
+            elif args.dbcrack:
+                class Cls: pass
+                cargs = Cls()
+                cargs.dbdir = path
+                cargs.sys = False
+                cargs.silent = True
+                cracked = dbcrack(None, cargs)
+                if not cracked:
+                    return
+                kod = crodump.koddecoder.new(cracked)
+            else:
+                kod = crodump.koddecoder.new()
+
+            db = Database(path, kod)
+            for tab in db.enumerate_tables():
+                tab.dump(args)
+                print("nr of records: %d" % db.bank.nrofrecords)
+                i = 0
+                for rec in db.enumerate_records(tab):
+                    for field, fielddef in zip(rec.fields, tab.fields):
+                        print(">> %s -- %s" % (fielddef, field.content))
+                    i += 1
+                    if i > args.maxrecs:
+                        break
+        except Exception as e:
+            print("ERROR: %s" % e)
+
+
+if __name__ == "__main__":
+    main()
--- a/crodump/hexdump.py
+++ b/crodump/hexdump.py
@ -0,0 +1,95 @@
+"""
+Several functions for converting bytes to readable text or hex bytes.
+"""
+import struct
+from binascii import b2a_hex, a2b_hex
+
+
+def unhex(data):
+    """
+    convert a possibly space separated list of 2-digit hex values to a byte-array
+    """
+    if type(data) == bytes:
+        data = data.decode("ascii")
+    data = data.replace(" ", "")
+    data = data.strip()
+    return a2b_hex(data)
+
+
+def ashex(line):
+    """
+    convert a byte-array to a space separated list of 2-digit hex values.
+    """
+    return " ".join("%02x" % _ for _ in line)
+
+
+def aschr(b):
+    """
+    convert a CP-1251 byte to a unicode character.
+    This will make both cyrillic and latin text readable.
+    """
+    if 32 <= b < 0x7F:
+        return "%c" % b
+    elif 0x80 <= b <= 0xFF:
+        try:
+            c = struct.pack("<B", b).decode("cp1251")
+            if c:
+                return c
+        except UnicodeDecodeError:
+            # 0x98 is the only invalid cp1251 character.
+            pass
+    return "."
+
+
+def asasc(line):
+    """
+    convert a CP-1251 encoded byte-array to a line of unicode characters.
+    """
+    return "".join(aschr(_) for _ in line)
+
+
+def hexdump(ofs, data, args):
+    """
+    Output offset prefixed lines of hex + ascii characters.
+    """
+    w = args.width
+    if args.ascdump:
+        fmt = "%08x: %s"
+    else:
+        fmt = "%%08x: %%-%ds  %%s" % (3 * w - 1)
+    for o in range(0, len(data), w):
+        if args.ascdump:
+            print(fmt % (o + ofs, asasc(data[o:o+w])))
+        else:
+            print(fmt % (o + ofs, ashex(data[o:o+w]), asasc(data[o:o+w])))
+
+
+def tohex(data):
+    """
+    Convert a byte-array to a sequence of 2-digit hex values without separators.
+    """
+    return b2a_hex(data).decode("ascii")
+
+
+def toout(args, data):
+    """
+    Return either ascdump or hexdump, depending on the `args.ascdump` flag.
+    """
+    if args.ascdump:
+        return asasc(data)
+    else:
+        return tohex(data)
+
+
+def strescape(txt):
+    """
+    Convert bytes or text to a c-style escaped string.
+    """
+    if type(txt) == bytes:
+        txt = txt.decode("cp1251")
+    txt = txt.replace("\\", "\\\\")
+    txt = txt.replace("\n", "\\n")
+    txt = txt.replace("\r", "\\r")
+    txt = txt.replace("\t", "\\t")
+    txt = txt.replace('"', '\\"')
+    return txt
--- a/crodump/koddecoder.py
+++ b/crodump/koddecoder.py
@ -0,0 +1,59 @@
+"""
+Decode CroStru KOD encoding.
+"""
+INITIAL_KOD = [
+    0x08, 0x63, 0x81, 0x38, 0xA3, 0x6B, 0x82, 0xA6, 0x18, 0x0D, 0xAC, 0xD5, 0xFE, 0xBE, 0x15, 0xF6,
+    0xA5, 0x36, 0x76, 0xE2, 0x2D, 0x41, 0xB5, 0x12, 0x4B, 0xD8, 0x3C, 0x56, 0x34, 0x46, 0x4F, 0xA4,
+    0xD0, 0x01, 0x8B, 0x60, 0x0F, 0x70, 0x57, 0x3E, 0x06, 0x67, 0x02, 0x7A, 0xF8, 0x8C, 0x80, 0xE8,
+    0xC3, 0xFD, 0x0A, 0x3A, 0xA7, 0x73, 0xB0, 0x4D, 0x99, 0xA2, 0xF1, 0xFB, 0x5A, 0xC7, 0xC2, 0x17,
+    0x96, 0x71, 0xBA, 0x2A, 0xA9, 0x9A, 0xF3, 0x87, 0xEA, 0x8E, 0x09, 0x9E, 0xB9, 0x47, 0xD4, 0x97,
+    0xE4, 0xB3, 0xBC, 0x58, 0x53, 0x5F, 0x2E, 0x21, 0xD1, 0x1A, 0xEE, 0x2C, 0x64, 0x95, 0xF2, 0xB8,
+    0xC6, 0x33, 0x8D, 0x2B, 0x1F, 0xF7, 0x25, 0xAD, 0xFF, 0x7F, 0x39, 0xA8, 0xBF, 0x6A, 0x91, 0x79,
+    0xED, 0x20, 0x7B, 0xA1, 0xBB, 0x45, 0x69, 0xCD, 0xDC, 0xE7, 0x31, 0xAA, 0xF0, 0x65, 0xD7, 0xA0,
+    0x32, 0x93, 0xB1, 0x24, 0xD6, 0x5B, 0x9F, 0x27, 0x42, 0x85, 0x07, 0x44, 0x3F, 0xB4, 0x11, 0x68,
+    0x5E, 0x49, 0x29, 0x13, 0x94, 0xE6, 0x1B, 0xE1, 0x7D, 0xC8, 0x2F, 0xFA, 0x78, 0x1D, 0xE3, 0xDE,
+    0x50, 0x4E, 0x89, 0xB6, 0x30, 0x48, 0x0C, 0x10, 0x05, 0x43, 0xCE, 0xD3, 0x61, 0x51, 0x83, 0xDA,
+    0x77, 0x6F, 0x92, 0x9D, 0x74, 0x7C, 0x04, 0x88, 0x86, 0x55, 0xCA, 0xF4, 0xC1, 0x62, 0x0E, 0x28,
+    0xB7, 0x0B, 0xC0, 0xF5, 0xCF, 0x35, 0xC5, 0x4C, 0x16, 0xE0, 0x98, 0x00, 0x9B, 0xD9, 0xAE, 0x03,
+    0xAF, 0xEC, 0xC9, 0xDB, 0x6D, 0x3B, 0x26, 0x75, 0x3D, 0xBD, 0xB2, 0x4A, 0x5D, 0x6C, 0x72, 0x40,
+    0x7E, 0xAB, 0x59, 0x52, 0x54, 0x9C, 0xD2, 0xE9, 0xEF, 0xDD, 0x37, 0x1E, 0x8F, 0xCB, 0x8A, 0x90,
+    0xFC, 0x84, 0xE5, 0xF9, 0x14, 0x19, 0xDF, 0x6E, 0x23, 0xC4, 0x66, 0xEB, 0xCC, 0x22, 0x1C, 0x5C,
+]
+
+
+class KODcoding:
+    """
+    class handing KOD encoding and decoding, optionally
+    with a user specified KOD table.
+    """
+    def __init__(self, initial=INITIAL_KOD):
+        self.kod = [_ for _ in initial]
+
+        # calculate the inverse table.
+        self.inv = [0 for _ in initial]
+        for i, x in enumerate(self.kod):
+            self.inv[x] = i
+
+    def decode(self, o, data):
+        """
+        decode : shift, a[0]..a[n-1] -> b[0]..b[n-1]
+            b[i] = KOD[a[i]]- (i+shift)
+        """
+        return bytes((self.kod[b] - i - o) % 256 for i, b in enumerate(data))
+
+    def encode(self, o, data):
+        """
+        encode : shift, b[0]..b[n-1] -> a[0]..a[n-1]
+            a[i] = INV[b[i]+ (i+shift)]
+        """
+        return bytes(self.inv[(b + i + o) % 256] for i, b in enumerate(data))
+
+
+def new(*args):
+    """
+    create a KODcoding object with the specified arguments.
+    """
+    return KODcoding(*args)
+
+
+
--- a/crodump/kodump.py
+++ b/crodump/kodump.py
@ -0,0 +1,81 @@
+"""
+This module has the functions for the 'kodump' subcommand from the 'crodump' script.
+"""
+from .hexdump import unhex, toout, hexdump
+import io
+import struct
+
+
+def decode_kod(kod, args, data):
+    """
+    various methods of hexdumping KOD decoded data.
+    """
+    if args.nokod:
+        # plain hexdump, no KOD decode
+        hexdump(args.offset, data, args)
+
+    elif args.shift:
+        # explicitly specified shift.
+        args.shift = int(args.shift, 0)
+        enc = kod.decode(args.shift, data)
+        hexdump(args.offset, enc, args)
+    elif args.increment:
+
+        def incdata(data, s):
+            """
+            add 's' to each byte.
+            This is useful for finding the correct shift from an incorrectly shifted chunk.
+            """
+            return b"".join(struct.pack("<B", (_ + s) & 0xFF) for _ in data)
+
+        # explicitly specified shift.
+        for s in range(256):
+            enc = incdata(data, s)
+            print("%02x: %s" % (s, toout(args, enc)))
+    else:
+        # output with all possible 'shift' values.
+        for s in range(256):
+            if args.invkod:
+                enc = kod.encode(s, data)
+            else:
+                enc = kod.decode(s, data)
+            print("%02x: %s" % (s, toout(args, enc)))
+
+
+def kod_hexdump(kod, args):
+    """
+    handle the `kodump` subcommand, KOD decode a section of a data file
+
+    This function is mostly useful for reverse-engineering the database format.
+    """
+    args.offset = int(args.offset, 0)
+    if args.length:
+        args.length = int(args.length, 0)
+    elif args.endofs:
+        args.endofs = int(args.endofs, 0)
+        args.length = args.endofs - args.offset
+
+    if args.width:
+        args.width = int(args.width, 0)
+    else:
+        args.width = 64 if args.ascdump else 16
+
+    if args.filename:
+        with open(args.filename, "rb") as fh:
+            if args.length is None:
+                fh.seek(0, io.SEEK_END)
+                filesize = fh.tell()
+                args.length = filesize - args.offset
+            fh.seek(args.offset)
+            data = fh.read(args.length)
+            decode_kod(kod, args, data)
+    else:
+        # no filename -> read from stdin.
+        import sys
+
+        data = sys.stdin.buffer.read()
+        if args.unhex:
+            data = unhex(data)
+        decode_kod(kod, args, data)
+
+
--- a/crodump/readers.py
+++ b/crodump/readers.py
@ -0,0 +1,96 @@
+import struct
+
+
+class ByteReader:
+    """
+    The ByteReader object is used when decoding various variable sized structures.
+    all functions raise EOFError when attempting to read beyond the end of the buffer.
+
+    functions starting with `read` advance the current position.
+    """
+    def __init__(self, data):
+        self.data = data
+        self.o = 0
+
+    def readbyte(self):
+        """
+        Reads a single byte
+        """
+        if self.o + 1 > len(self.data):
+            raise EOFError()
+        self.o += 1
+        return struct.unpack_from("<B", self.data, self.o - 1)[0]
+
+    def testbyte(self, bytevalue):
+        """
+        returns True when the current bytes matches `bytevalue`.
+        """
+        if self.o + 1 > len(self.data):
+            raise EOFError()
+        return self.data[self.o] == bytevalue
+
+    def readword(self):
+        """
+        Reads a 16 bit unsigned little endian value
+        """
+        if self.o + 2 > len(self.data):
+            raise EOFError()
+        self.o += 2
+        return struct.unpack_from("<H", self.data, self.o - 2)[0]
+
+    def readdword(self):
+        """
+        Reads a 32 bit unsigned little endian value
+        """
+        if self.o + 4 > len(self.data):
+            raise EOFError()
+        self.o += 4
+        return struct.unpack_from("<L", self.data, self.o - 4)[0]
+
+    def readbytes(self, n=None):
+        """
+        Reads the specified number of bytes, or
+        when no size was specified, the remaining bytes in the buffer
+        """
+        if n is None:
+            n = len(self.data) - self.o
+        if self.o + n > len(self.data):
+            raise EOFError()
+        self.o += n
+        return self.data[self.o-n:self.o]
+
+    def readlongstring(self):
+        """
+        Reads a cp1251 encoded string prefixed with a dword sized length
+        """
+        namelen = self.readdword()
+        return self.readbytes(namelen).decode("cp1251")
+
+    def readname(self):
+        """
+        Reads a cp1251 encoded string prefixed with a byte sized length
+        """
+        namelen = self.readbyte()
+        return self.readbytes(namelen).decode("cp1251")
+
+    def readtoseperator(self, sep):
+        """
+        reads bytes upto a bytes sequence matching `sep`.
+        when no `sep` is found, return the remaining bytes in the buffer.
+        """
+        if self.o > len(self.data):
+            raise EOFError()
+        oldoff = self.o
+        off = self.data.find(sep, self.o)
+        if off >= 0:
+            self.o = off + len(sep)
+            return self.data[oldoff:off]
+        else:
+            self.o = len(self.data)
+            return self.data[oldoff:]
+
+    def eof(self):
+        """
+        return True when the current position is at or beyond the end of the buffer.
+        """
+        return self.o >= len(self.data)
--- a/docs/cronos-research.md
+++ b/docs/cronos-research.md
@ -0,0 +1,322 @@
+# About Cronos databases.
+
+A _cronos database_ consists of those files
+
+    CroBank.dat
+    CroBank.tad
+    CroIndex.dat
+    CroIndex.tad
+    CroStru.dat
+    CroStru.tad
+
+and a Vocabulary database with another set of these files in a sub directory Voc/
+
+`CroIndex.*` can be ignored for most dumping purposes, unless the user suspects there to be residues of deleted data.
+
+Additionally there are the `CroSys.dat` and `CroSys.tad` files in the cronos application directory, which list the currently
+known databases.
+
+## app installation
+
+On a default non-russian Windows installation, the CronosPro app shows with several encoding issues that can be fixed like this:
+
+    reg set HKLM\System\CurrentControlSet\Control\Nls\Codepage 1250=c_1251.nls 1252=c_1251.nls
+
+[from](https://ixnfo.com/en/question-marks-instead-of-russian-letters-a-solution-to-the-problem-with-windows-encoding.html)
+
+Also note that the v3 cronos app will run without problem on a linux machine using [wine](https://winehq.org/)
+
+## Files ending in .dat
+
+All .dat files start with a 19 byte header:
+
+    char      magic[8]      // allways: 'CroFile\x00'
+    uint16    unknown
+    char      version[5]    // 01.XX see, below
+    uint16    encoding      // bitfield: bit0 = KOD, bit1 = ?
+    uint16    blocksize     // 0x0040, 0x0200 or 0x0400
+
+Most Bank files use blocksize == 0x0040
+most Index files use blocksize == 0x0400
+most Stru files use blocksize == 0x0200
+
+This is followed by a block of 0x101 or 0x100 minus 19 bytes seemingly random data.
+
+The unknown word is unclear but seems not to be random, might be a checksum.
+
+## File versions
+
+* Pre cronos pro used version `01.01`.
+* Cronos version 3 introduced version indicators of `01.02`, `01.03`, `01.04` and `01.05`.
+ * `01.02` and `01.04` are called "small model", i.e. 32 bit offsets,
+ * `01.03` and `01.05` are called "big model", i.e. 64 bit offsets.
+ * `01.04` and `01.05` are called "lite".
+* Cronos version 4 introduced version indicators of `01.11`, `01.13` and `01.14`.
+ * `01.11` are called "small model", i.e. 32 bit offsets,
+ * `01.13` are called "pro".
+ * `01.14` are called "lite".
+* Cronos version 7 introduced version indicator of `01.19`.
+
+## Files ending in .tad
+
+The first two `uint32` are the number of deleted records and the tad offset to the first deleted entry.
+The deleted entries form a linked list, with the size always 0xFFFFFFFF.
+
+Depending on the version in the `.dat` header, `.tad` use either 32 bit or 64 bit file offsets
+
+version `01.02` and `01.04` use 32 bit offsets:
+
+    uint32 offset
+    uint32 size       // with flag in upper bit, 0 -> large record
+    uint32 checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
+
+versions `01.03`, `01.05` and `01.11` use 64 bit offsets:
+
+    uint64 offset
+    uint32 size       // with flag in upper bit, 0 -> large record
+    uint32 checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
+
+where size can be 0xffffffff (indicating a free/deleted block).
+Bit 31 of the size indicates that this is an extended record.
+
+Extended records start with plaintext: { uint32 offset, uint32 size }  or { uint64 offset, uint32 size }
+
+
+## the 'old format'
+
+The original description made it look like there were different formats for the block references.
+
+This was found in previously existing documentation, but no sample databases with this format were found so far.
+
+If the .dat file has a version of 01.03 or later, the corresponding .tad file looks like this:
+
+    uint32_t offset
+    uint32_t size       // with flag in upper bit, 0 -> large record
+    uint32_t checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
+    uint32_t unknownn   // mostly 0
+
+The old description would also assume 12 byte reference blocks but a packed struct, probably if the CroFile version is 01.01.
+
+    uint32 offset1
+    uint16 size1
+    uint32 offset2
+    uint16 size2
+
+with the first chunk read from offset1 with length size1 and potentially more parts with total length of size2 starting at file offset offset2 with the first `uint32` of the 256 byte chunk being the next chunk's offset and a maximum of 252 bytes being actual data.
+
+However, I never found files with .tad like that. Also the original description insisted on those chunks needing the decode-magic outlined below, but the python implementation only does that for CroStru files and still seems to produce results.
+
+## CroStru
+
+Interesting files are CroStru.dat containing metadata on the database within blocks whose size and length are found in CroStru.tad. These blocks are rotated byte wise using an sbox found in the cro2sql sources and then each byte is incremented by a one byte counter which is initialised by a per block offset. The sbox looks like this:
+
+    unsigned char kod[256] = {
+      0x08, 0x63, 0x81, 0x38, 0xa3, 0x6b, 0x82, 0xa6,
+      0x18, 0x0d, 0xac, 0xd5, 0xfe, 0xbe, 0x15, 0xf6,
+      0xa5, 0x36, 0x76, 0xe2, 0x2d, 0x41, 0xb5, 0x12,
+      0x4b, 0xd8, 0x3c, 0x56, 0x34, 0x46, 0x4f, 0xa4,
+      0xd0, 0x01, 0x8b, 0x60, 0x0f, 0x70, 0x57, 0x3e,
+      0x06, 0x67, 0x02, 0x7a, 0xf8, 0x8c, 0x80, 0xe8,
+      0xc3, 0xfd, 0x0a, 0x3a, 0xa7, 0x73, 0xb0, 0x4d,
+      0x99, 0xa2, 0xf1, 0xfb, 0x5a, 0xc7, 0xc2, 0x17,
+      0x96, 0x71, 0xba, 0x2a, 0xa9, 0x9a, 0xf3, 0x87,
+      0xea, 0x8e, 0x09, 0x9e, 0xb9, 0x47, 0xd4, 0x97,
+      0xe4, 0xb3, 0xbc, 0x58, 0x53, 0x5f, 0x2e, 0x21,
+      0xd1, 0x1a, 0xee, 0x2c, 0x64, 0x95, 0xf2, 0xb8,
+      0xc6, 0x33, 0x8d, 0x2b, 0x1f, 0xf7, 0x25, 0xad,
+      0xff, 0x7f, 0x39, 0xa8, 0xbf, 0x6a, 0x91, 0x79,
+      0xed, 0x20, 0x7b, 0xa1, 0xbb, 0x45, 0x69, 0xcd,
+      0xdc, 0xe7, 0x31, 0xaa, 0xf0, 0x65, 0xd7, 0xa0,
+      0x32, 0x93, 0xb1, 0x24, 0xd6, 0x5b, 0x9f, 0x27,
+      0x42, 0x85, 0x07, 0x44, 0x3f, 0xb4, 0x11, 0x68,
+      0x5e, 0x49, 0x29, 0x13, 0x94, 0xe6, 0x1b, 0xe1,
+      0x7d, 0xc8, 0x2f, 0xfa, 0x78, 0x1d, 0xe3, 0xde,
+      0x50, 0x4e, 0x89, 0xb6, 0x30, 0x48, 0x0c, 0x10,
+      0x05, 0x43, 0xce, 0xd3, 0x61, 0x51, 0x83, 0xda,
+      0x77, 0x6f, 0x92, 0x9d, 0x74, 0x7c, 0x04, 0x88,
+      0x86, 0x55, 0xca, 0xf4, 0xc1, 0x62, 0x0e, 0x28,
+      0xb7, 0x0b, 0xc0, 0xf5, 0xcf, 0x35, 0xc5, 0x4c,
+      0x16, 0xe0, 0x98, 0x00, 0x9b, 0xd9, 0xae, 0x03,
+      0xaf, 0xec, 0xc9, 0xdb, 0x6d, 0x3b, 0x26, 0x75,
+      0x3d, 0xbd, 0xb2, 0x4a, 0x5d, 0x6c, 0x72, 0x40,
+      0x7e, 0xab, 0x59, 0x52, 0x54, 0x9c, 0xd2, 0xe9,
+      0xef, 0xdd, 0x37, 0x1e, 0x8f, 0xcb, 0x8a, 0x90,
+      0xfc, 0x84, 0xe5, 0xf9, 0x14, 0x19, 0xdf, 0x6e,
+      0x23, 0xc4, 0x66, 0xeb, 0xcc, 0x22, 0x1c, 0x5c,
+    };
+
+
+given the `shift`, the encoded data: `a[0]..a[n-1]` and the decoded data: `b[0]..b[n-1]`, the encoding works as follows:
+
+    decode: b[i] = KOD[a[i]] - (i+shift)
+    encode: a[i] = INV[b[i] + (i+shift)]
+
+
+The original description of an older database format called the per block counter start offset 'sistN' which seems to imply it to be constant for certain entries. They correspond to a "system number" of meta entries visible in the database software. For encoded records this is their primary key.
+
+In noticed that the first 256 bytes of CroStru.dat look close to identical (except the first 16 bytes) than CroBank.dat.
+
+The toplevel table-id for CroStru and CroSys is #3, while referenced records have tableid #4.
+
+## CroBank
+
+CroBank.dat contains the actual database entries for multiple tables as described in the CroStru file. After each chunk is re-assembled (and potentially decoded with the per block offset being the record number in the .tad file).
+
+Its first byte defines, which table it belongs to. It is encoded in cp1251 (or possibly IBM866) with actual column data separated by 0x1e.
+
+There is an extra concept of sub fields in those columns, indicated by a 0x1d byte.
+
+Fields of field types 6 and 9 start with an 0x1b byte, followed by a uint32 size of the actual fields. It may then contain further 0x1e bytes which indicate sub field separators.
+
+If used for field type 6, the field begins with two uint32 (the first one mostly 0x00000001, the second one the size of the next strings) followed by three 0x1e separated strings containing file name, file extension and system number of the actual file record data referred to by this record.
+
+## structure definitions
+
+records start numbering at '1'.
+Names are stored as: `byte strlen + char value[strlen]`
+
+The first entry contains:
+
+    uint8
+    array {
+        Name keyname
+        uint32 index_or_size;   // size when bit31 is set.
+        uint8 data[size]
+    }
+
+this results in a dictionary, with keys like: `Bank`, `BankId`, `BankTable`, `Base`nnn, etc.
+
+the `Base000` entry contains the record number for the table definition of the first table.
+
+## table definitions
+
+    uint16 unk1
+    union {
+        uint8 shortversion; // 1
+        uint16 version;     // >1
+    }
+    uint8 somelen;     // 5 or 9
+    struct {
+        uint8 unk3
+        uint32 unk4    // not there when 'somelen'==5
+        uint32 unk5
+    }
+    uint32 tableid
+    Name   tablename
+    Name   abbreviation
+    uint32 unk7
+    uint32 nrfields
+
+    array {
+      uint16 entrysize    -- total nr of bytes in this entry.
+      uint16 fieldtype    // see below
+      uint32 fieldindex1  // presentation index (i.e. where in the UI it shows)
+      Name   fieldname
+      uint32 flags
+      uint8  alwaysone    // maybe the 'minvalue'
+      uint32 fieldindex2  // serialization index (i.e. where in the record in the .dat it appears)
+      uint32 fieldsize    // max fieldsize
+      uint32 unk4
+      ...
+      followed by remaining unknown bytes
+    } fields[nrfields]
+
+    uint32 extradatstr    // amount of unknown length indexed data strings between field definition blocks
+    array {
+      uint16 datalen
+      uint8[datalen]
+    } datastrings[extradatstr]
+
+    uint32 unk8
+    uint8  fielddefblock  // always 2, probably the number of this block of field definitions
+    uint32 unk9
+
+    uint32 nrextrafields
+    array {
+      ... as above
+    } extrafields[nrextrafields]
+
+    followed by remaining unknown bytes
+    ...
+
+
+    In order to have field definitions for all the fields in a record from the .dat for that table,
+    fields.append(extrafields) must be sorted by their fieldindex2.
+
+## field types
+
+The interface gives a list of field types I can select for table columns:
+
+* 0  - Системный номер = Primary Key ID
+* 1  - Числовое = Numeric
+* 2  - Текстовое = Text
+* 3  - Словарное = Dictionary
+* 4  - Дата = Date
+* 5  - Время = Time
+* 6  - Фаил = File (internal)
+* 29 - Внеэшний фаил = File (external)
+* 7  - Прямая ссылка = Direkt link
+* 8  - Обратная ссылка = Back link
+* 9  - Прямаяь-Обратная ссылка = Direct-Reverse link
+* 17 - Связь по полю = Field communication
+
+Other unassigned values in the table entry definition are
+
+* Dictionary Base (defaults to 0)
+* номер в записи = number in the record
+* Длина Поля = Field size
+* Flags:
+  * (0x2000) Множественное = Multiple
+  * (0x0800) Информативное = Informative
+  * (0x0040) Некорректируемое = Uncorrectable
+  * (0x1000) поиск на вводе = input search
+  * (?) симбольное =  symbolic
+  * (?) Лемматизировать = Lemmatize
+  * (?) поиск по значениям = search by values
+  * (0x0200) замена непустого значения = replacement of a non-empty value
+  * (0x0100) замена значения = value replacement
+  * (0x0004) автозаполнения = autocomplete
+  * (?) корневая связь = root connection
+  * (?) допускать дубли = allow doubles
+  * (0x0002) обязательное = obligatory
+
+## compressed records
+
+some records are compressed, the format is like this:
+
+    multiple-chunks {
+        uint16 size;     // stored in bigendian format.
+        uint8   head[2] = { 8, 0 }
+        uint32 crc32
+        uint8   compdata[size-6]
+    }
+    uint8   tail[3] = { 0, 0, 2 }
+
+
+# v4 format
+
+The header version 01.11 indicates a database created with cronos v4.x.
+
+## .tad
+
+A 4 dword header:
+
+    dword -2
+    dword nr deleted
+    dword first deleted
+    dword 0
+
+16 byte records:
+    qword offset,  with flags in upper 8 bits.
+    dword size
+    dword unk
+
+flags:
+    02,03  - deleted record.
+    04  - compressed { int16be size; int16be flag int32le crc; byte data[size-6]; } 00 00 02
+    00  - extended record
+
+## .dat
+
+The .dat file of a 01.11 database has 64bit offsets, like the 01.03 file format.
+
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,51 @@
+from setuptools import setup
+setup(
+    name = "cronodump",
+    version = "1.1.0",
+    entry_points = {
+        'console_scripts': [
+            'croconvert=crodump.croconvert:main',
+            'crodump=crodump.crodump:main',
+        ],
+    },
+    packages = ['crodump'],
+    author = "Willem Hengeveld, Dirk Engling",
+    author_email = "itsme@xs4all.nl, erdgeist@erdgeist.org",
+    description = "Tool and library for extracting data from Cronos databases.",
+    long_description_content_type='text/markdown',
+    long_description = """
+The cronodump utility can parse most of the databases created by the [CronosPro](https://www.cronos.ru/) database software
+and dump it to several output formats.
+
+The software is popular among Russian public offices, companies and police agencies.
+
+Example usage:
+
+    croconvert --csv <yourdbpath>
+
+Will create a .csv dump of all records in your database.
+
+or:
+
+    crodump strudump <yourdbpath>
+
+Will print details on the internal definitions of the tables present in your database.
+
+For more details see the [README.md](https://github.com/alephdata/cronodump/blob/master/README.md) file.
+""",
+    license = "MIT",
+    keywords = "cronos dataconversion databaseexport",
+    url = "https://github.com/alephdata/cronodump/",
+    classifiers = [
+        'Environment :: Console',
+        'Intended Audience :: End Users/Desktop',
+        'Intended Audience :: Developers',
+        'License :: OSI Approved :: MIT License',
+        'Operating System :: OS Independent',
+        'Programming Language :: Python :: 3.7',
+        'Topic :: Utilities',
+        'Topic :: Database',
+    ],
+    python_requires = '>=3.7',
+    extras_require={ 'templates': ['Jinja2'] },
+)
--- a/templates/html.j2
+++ b/templates/html.j2
@ -0,0 +1,58 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Cronos Database Dump</title>
+  </head>
+  <body>
+  {% for table in db.enumerate_tables(files=True) %}
+    <table>
+      <caption>{{ table.tablename | e }}</caption>
+      <thead>
+        <tr>
+        {%- for field in table.fields %}
+          <th>{{ field.name | e }}</th>
+        {%- endfor %}
+          <th>Data</th>
+        </tr>
+      </thead>
+      <tbody>
+      {% for system_number, file in db.enumerate_files(table) %}
+        <tr>
+          <td>{{ system_number | e }}</td>
+          <td><a href="data:application/x-binary;base64,{{ base64.b64encode( file ).decode('utf-8') }}">File content</a></td>
+        <tr>
+      {% endfor %}
+      </tbody>
+    </table>
+  {% endfor %}
+  {% for table in db.enumerate_tables(files=False) %}
+    {%- if table.tableimage -%}
+      <img src="data:image;base64,{{ base64.b64encode( table.tableimage.data ).decode('utf-8') }}"/>
+    {%- endif -%}
+    <table>
+      <caption>{{ table.tablename | e }}</caption>
+      <thead>
+        <tr>
+        {%- for field in table.fields %}
+          <th>{{ field.name | e }}</th>
+        {%- endfor %}
+        </tr>
+      </thead>
+      <tbody>
+        {%- for record in db.enumerate_records( table ) %}
+        <tr>
+          {%- for field in record.fields %}
+            {%- if field.typ == 6 and field.content -%}
+            <td><a download="{{ field.filename }}.{{ field.extname }}" href="data:application/x-binary;base64,{{ db.get_record( field.filedatarecord, True ) }}">{{ field.filename | e }}.{{ field.extname | e }}</a></td>
+            {%- else -%}
+            <td>{{ field.content | e }}</td>
+            {%- endif -%}
+          {%- endfor %}
+        </tr>
+        {%- endfor %}
+      </tbody>
+    </table>
+    {% endfor %}
+  </body>
+</html>
--- a/templates/postgres.j2
+++ b/templates/postgres.j2
@ -0,0 +1,20 @@
+{% for table in db.enumerate_tables(files=False) %}
+
+CREATE TABLE "{{ table.tablename | replace('"', '_') }}" (
+    {%- for field in table.fields %}
+        "{{ field.name | replace('"', '_') }}" {{ field.sqltype() -}}
+        {{- ", " if not loop.last else "" -}}
+    {%- endfor %}
+);
+
+INSERT INTO "{{ table.tablename | replace('"', '_') }}" VALUES
+    {%- for record in db.enumerate_records( table ) %}
+        ( {%- for field in record.fields -%}
+            '{{ field.content | replace("'", "''") }}' {{- ", " if not loop.last else "" -}}
+        {%- endfor -%}
+        )
+        {{- ", " if not loop.last else "" -}}
+    {%- endfor %}
+;
+
+{% endfor %}
--- a/test_data/all_field_types/CroBank.dat
+++ b/test_data/all_field_types/CroBank.dat
--- a/test_data/all_field_types/CroBank.tad
+++ b/test_data/all_field_types/CroBank.tad
--- a/test_data/all_field_types/CroIndex.dat
+++ b/test_data/all_field_types/CroIndex.dat
--- a/test_data/all_field_types/CroIndex.tad
+++ b/test_data/all_field_types/CroIndex.tad
--- a/test_data/all_field_types/CroStru.dat
+++ b/test_data/all_field_types/CroStru.dat
--- a/test_data/all_field_types/CroStru.tad
+++ b/test_data/all_field_types/CroStru.tad
--- a/test_data/all_field_types/Voc/CroBank.dat
+++ b/test_data/all_field_types/Voc/CroBank.dat
--- a/test_data/all_field_types/Voc/CroBank.tad
+++ b/test_data/all_field_types/Voc/CroBank.tad
--- a/test_data/all_field_types/Voc/CroIndex.dat
+++ b/test_data/all_field_types/Voc/CroIndex.dat
--- a/test_data/all_field_types/Voc/CroIndex.tad
+++ b/test_data/all_field_types/Voc/CroIndex.tad
--- a/test_data/all_field_types/Voc/CroStru.dat
+++ b/test_data/all_field_types/Voc/CroStru.dat
--- a/test_data/all_field_types/Voc/CroStru.tad
+++ b/test_data/all_field_types/Voc/CroStru.tad