Panel data (pooled cross-section and time-series) require special care. Here are some pointers.
Consider a data set composed of observations on each of n cross-sectional units (countries, states, persons or whatever) in each of T periods. Let each observation comprise the values of m variables of interest. The data set then contains mnT values.
The data should be arranged "by observation": each row represents an observation; each column contains the values of a particular variable. The data matrix then has nT rows and m columns. That leaves open the matter of how the rows should be arranged. There are two possibilities.[1]
Rows grouped by unit. Think of the data matrix as composed of n blocks, each having T rows. The first block of T rows contains the observations on cross-sectional unit 1 for each of the periods; the next block contains the observations on unit 2 for all periods; and so on. In effect, the data matrix is a set of time-series data sets, stacked vertically.
Rows grouped by period. Think of the data matrix as composed of T blocks, each having n rows. The first n rows contain the observations for each of the cross-sectional units in period 1; the next block contains the observations for all units in period 2; and so on. The data matrix is a set of cross-sectional data sets, stacked vertically.
You may use whichever arrangement is more convenient. The first is perhaps easier to keep straight. If you use the second then of course you must ensure that the cross-sectional units appear in the same order in each of the period data blocks. Under gretl's "Restructure panel" which allows you to convert from stacked cross-section form to stacked time series.
menu you will find an itemIn either case you can use the frequency field in the observations line of the data header file (see Chapter 4) to make life a little easier.
Grouped by unit: Set the frequency equal to T. Suppose you have observations on 20 units in each of 5 time periods. Then this observations line is appropriate: 5 1.1 20.5 (read: frequency 5, starting with the observation for unit 1, period 1, and ending with the observation for unit 20, period 5). Then, for instance, you can refer to the observation for unit 2 in period 5 as 2.5, and that for unit 13 in period 1 as 13.1.
Grouped by period: Set the frequency equal to n. In this case if you have observations on 20 units in each of 5 periods, the observations line should be: 20 1.01 5.20 (read: frequency 20, starting with the observation for period 1, unit 01, and ending with the observation for period 5, unit 20). One refers to the observation for unit 2, period 5 as 5.02.
If you decide to construct a panel data set using a spreadsheet program then import the data into gretl, the program may not at first recognize the special nature of the data. You can fix this by using the command setobs (see Chapter 9) or the GUI menu item "Sample, Set frequency, startobs…".
[1] | If you don't intend to make any conceptual or statistical distinction between cross-sectional and temporal variation in the data you can arrange the rows arbitrarily, but this is probably wasteful of information. |