Description of the .frm format

dbsake parses some undocumented / poorly documented formats for MySQL which were decoded by inspecting the MySQL source code. Here is a detailed document that attempts to describe the details of such formats for other who may hack on these parts of dbsake.

.frm fileinfo section

The fileinfo section consists of 64 bytes that encode information about the rest of the .frm file and various table level options.

The table here describes the information each byte encodes. Offsets are from the beginning of the file. All values are little-endian integer of the size noted in the Length column.

Offset

Length (bytes)

Description

Hex

Dec

0000

0

2 bytes

“Magic” identifier Always the byte sequence fe 01

0001

1

0002

2

1 byte

.frm version 1

0003

3

1 byte

“legacy_db_type” 2

0004

4

2-bytes

“names_length” - always 3 and not used in recent MySQL. MySQL 3.23 set this to 1

0005

5

0006

6

2-bytes

IO_SIZE; Always 4096 (0010)

0007

7

0008

8

2-bytes

number of “forms” in the .frm Should always be 1, even back to 3.23

0009

9

000a

10

4-bytes

Not really used except in .frm creation Purpose unclear, i guess for aligning sections in the ancient unireg format

000b

11

000c

12

000d

13

000e

14

2-bytes

“tmp_key_length”; if equal to 0xffff then the key length is a 4-byte integer at offset 0x002f

000f

15

0010

16

2-bytes

“rec_length” - this is the size of the byte string where default values are stored See Default Values

0011

17

0012

18

4-bytes

Table MAX_ROWS=N opton

0013

19

0014

20

0015

21

0016

22

4-bytes

Table MIN_ROWS=N option

0017

23

0018

24

0019

25

001a

26

1-byte

Unused - always zero in 3.23 through 5.6

001b

27

1-byte

Always 2 - “// Use long pack-fields”

001c

28

2-bytes

key_info_length - size in bytes of the keyinfo section

001d

29

001e

30

2-bytes

create_info->table_options See HA_OPTION_* values in include/my_base.h

001f

31

0020

32

1-byte

Unused; comment “// No filename anymore”

0021

33

1-byte

5 in 5.0+ comment “// Mark for 5.0 frm file”

0022

34

4-bytes

Table AVG_ROW_LENGTH option

0023

35

0024

36

0025

37

0026

38

1-byte

Table DEFAULT CHARACTER SET option 3

0027

39

1-byte

Unused 4

0028

40

1-byte

Table ROW_FORMAT option

0029

41

1-byte

Unused; formerly Table RAID_TYPE option

002a

42

1-byte

Unused; formerly Table RAID_CHUNKS option

002b

43

4-bytes

Unused; formerly Table RAID_CHUNKSIZE option

002c

44

002d

45

002e

46

002f

47

4-bytes

Size in bytes of the keyinfo section where index metadata is defined

0030

48

0031

49

0032

50

0033

51

4-bytes

MySQL version encoded as a 4-byte integer in little endian format. This is the value MYSQL_VERSION_ID from include/mysql_version.h in the mysql source tree. Example: ‘xb6xc5x00x00’ 0x0000c5b6 => 50614 => MySQL v5.6.14

0034

52

0035

53

0036

54

0037

55

4-bytes

Size in bytes of table “extra info”

  • CONNECTION=<string> (FEDERATED tables)

  • ENGINE=<string>

  • PARTITION BY clause + partitioning flags

  • WITH PARSER names (MySQL 5.1+)

  • Table COMMENT 5

0038

56

0039

57

003a

58

003b

59

2-byte

extra_rec_buf_length

003c

60

003d

61

1-byte

Storage engine if table is partitioned 6

003e

62

2-bytes

Table KEY_BLOCK_SIZE option

003f

63

1

This is defined as FRM_VER+3+ test(create_info->varchar) in 5.0+ Where FRM_VER is defined as 6, so the frm version will be either 9 or 10 depending on if the table has varchar columns

2

Maps to an enum value from “enum legacy_db_type” in sql/handler.h

3

Character set id maps to an id from INFORMATION_SCHEMA.COLLATIONS and encodes both the character set name and the collation

4

In the source code, there is a comment indicating this byte will be used for TRANSACTIONAL and PAGE_CHECKSUM table options in the future

5

The table comment is stored in one of two places in the .frm file If the comment size in bytes is < 255 this is stored in the forminfo Otherwise it will be estored in the extra info section after the fulltext parser names (if any)

6

Numeric id that maps to a enum value from “enum legacy_db_type” in sql/handler.h, similar to legacy_db_type

Key info section

The key info section should always start at offset 0x1000 (4096); this is obtained from the 2-byte integer in fileinfo header at offset 6, but in any version of MySQL in the past decade will be 4096.

The size in bytes of this section is obtained from the key_length - typically this is 4-byte integer at offset 0x002f (47) in the header. Older versions of MySQL only allocated a 2-byte integer for this length, at offset 0x000e (14). This old location will have the value 0xffff if the key info length exceeds the capacity of a 2-byte integer.

The structure of this section consists of an initial header noting the total number of keys, total number of key components and the size of “extra” key data (namely index names and index comments). This is followed by a group for each index defined in the table and then the extra data - names for each index followed by an optional index comment strings.

The header is essentially three integers:

[key_count][key_parts_count][length of extra data]

Where key_count is the number of indexes this metdata describes,

key_parts_count is the number of components across all indexes and the length of extra data indicates how many bytes the index names and comments uses.

key_count and key_parts_count may be either 1 or 2 bytes. If the first byte is > 128 then key_count and key_parts_count use two bytes, otherwise they use one byte each. The extra length is always a 2 byte integer.

The logic in dbsake is:

key_count = keyinfo.uint8()
if key_count < 128:
    key_parts_count = keyinfo.uint8()
    keyinfo.skip(2)
else:
    key_count = (key_count & 0x7f) | (keyinfo.uint8() << 7)
    key_parts_count = keyinfo.uint16()
key_extra_length = keyinfo.uint16()

Each key metadata consists of 8 bytes and each key part consists of 9 bytes. So the total length of the index metadata is calculated by the formula:

key_count * 8 + key_parts_count * 9

And this is the offset, relative to the start of keyinfo section, where the index names and comments are found.

Each index group consists of 8 bytes of key metadata followed by 9 bytes of metadata for each indexed column.

Index metadata (8 bytes)

flags

2 bytes

key flags from include/my_base.h.

length

2 bytes

length of the index

key parts

1 byte

number of columns covered by this index

algorithm

1 byte

Key algorithm - maps to enum value “enum ha_key_alg”

unused

2 bytes

Followed by 1 or more column index metadata:

Column index metadata (9 bytes)

field number

2 bytes

Which column is indexed

offset

2 bytes

Offset into a MySQL datastructure (internal use)

unused

1 byte

key_type

2 bytes

maps to enum ha_base_keytype

length

2 bytes

length of this index component

The names and comments follow this data with names being separated by the byte value 255 (‘\xff’) and the names and comments sections being separated by a null byte. So this essentially looks like this sort of python bytestring:

b'\xffPRIMARY\xffix_column1\xff\x00<index comments>'

Index comments are length-prefixed strings. So there is a 2 byte integer (little-endian) followed by the specified number of bytes for each comment.

Index comments are not terribly common so this will often be empty.

Defaults Section

Immediately after the keyinfo section there is a byte string that details the defaults for each column. So this starts at IO_SIZE + key_length, which can be derived from the .frm header.

The format of this buffer is essentially:

[null_map][encoded column data]

Where the null_map is 1 or more bytes, with a bit per-column that can be nullable. The total number of bytes will be:

(null_column_count + 7 ) // 8

The first bit is always set and column bits start a 1 offset in the null map. If a bit is set for the current column then this indicates the the default is null (ie. DEFAULT NULL).

If a column’s default is not null, then its default data will be recorded at some offset noted in the Column metadata (described elsewhere in this document). The actual data format depends on the column type. This basically breaks down into the following cases:

  • integer-types - little-endian integers of 1, 2, 3, 4 or 8 bytes

  • float/double - little endian IEEE 745 values

  • decimal - either ascii strings (“3.14”) < MySQL 5.0, or a binary

    encoding of 9 decimal digits per 4-byte big-endian integer

  • timestamp - little endian integer representing seconds relative to epoch (< 5.6)

  • timestamp2 - big-endian integer representing seconds relative to epoch (5.6+);

    additionally packed fractional digits, similar to the decimal format

  • date/time - encodes the various components into various bits of a 3 - 8 byte integer

  • char - just a string with length bytes (space padded)

  • varchar - length-prefix string, with the prefix being a little endian integer of

    1 to 2 bytes.

See dbsake/mysqlfrm/mysqltypes.py unpack_type_<name> method for how each datatype is actually decoded.

Extra data section

The “extra” section encodes some basic table properties. These include:

  • CONNECTION=<name> string (used by FEDERATED)

  • ENGINE=<name> strings

  • PARTITION BY string

  • “auto partitioned flag” (used by NDB, at least)

  • WITH PARSER - fulltext parser plugin names

  • Table COMMENT ‘…’ - only if > 254 bytes

Except for the fulltext parser plugin names (which are null terminated), all of these properties are length-prefix strings. This essentially has the format:

[2-byte length][<connection string>]
[2-byte length][<engine name string>]
[4-byte length][<partition by clause>][null byte]
[1-byte is_autopartitioned flag]
[parser name][null_byte] for each fulltext parser plugin used
[2-byte length][<table comment string>]

These strings should be decoded per the table’s default character set.

FormInfo

The .frm form info is a section consisting of 288 bytes with integers noting the length or count of elements in the table.

The start of this section can be found at offset 64 + names_len from the .frm header and the offset is a 4 byte integer. In python this would be found via

>> f = open('/var/lib/mysql/mysql/user.frm', 'rb')
>> f.seek(0x0004) # "names_length" documented in the .frm fileinfo header
>> names_len, = struct.unpack("<H", f.read(2)) # always 3 in modern mysql
>> f.seek(64 + names_len)
>> forminfo_offset, = struct.unpack("<I", f.read(4))

Here is a description of some of the more interesting fields available in the forminfo section. This is not meant to be exhaustive but merely to document the fields necessary for interpreting pertinent column metadata.

All offsets are relative to the start of the forminfo section

column_count

2 byte integer at offset 258

The number of coumns defined on this table

screens_length

2 byte integer at offset 260

How many bytes follow the forminfo section prior to the start of the column metadata

null_columns

2 byte integer at offset 282

How many nullable columns are defined in this table

names_length

2 byte integer at offset 268

Length in bytes (including delimiters) of column names

interval_length

2 byte integer at offset 274

Length in bytes (including delimiters) of the set/enum labels

comments_length

2 byte integer at offset 284

Length in bytes of the column comments

Column Metadata

17 bytes per column

Followed by \xff separated column names

Followed by a null byte

Followed by null terminated interval groups with each interval group consisting of interval names \xff separated.

Followed by a single string of column comments.