1. Estimating the Missing Hydrological Data
Due to the fragmentary and short record lengths available in the majority of gauging
stations used in the hydrologic analysis in the river basins, data extension and filling
in missing data should be performed. The concurrent measurements at two or more
complete gauging stations can be used to fill in missing data or to extend short
records for other flow measuring stations based on the recorded data.
Thus the extended or completed records can be used to estimate the new set of
statistical parameters as the mean, variance, skewness coefficient, etc. However, the
extension or completion of records does not always give satisfactory results. It is
necessary that the lengthened or completed sequence gives better parameter
estimates than the short or incomplete sequence, whether or not this is fulfilled
depends on the degree of association between the concurrent records.
The analysis of short and missing records is a complicated process. It can go from
the very simple case of filling in missing data using a nearby station, to the case of
multiple series with missing values and with some of the series being short in
extend. The techniques used for transferring information from nearby measuring
station with complete or longer records to stations with missing values or short
records are based on regression analysis. Simple or multiple linear regression
models could be used for this purpose.
2. Correlation and Regression
2.1. Correlation analysis
Statistical correlation methods are generally used to measure the degree of
association between random variables like the measured flow at one station and
another station downstream. If such association is between two variables the
correlation is called "simple correlation" while if the association is between one
1
, variable and a set of variables the correlation is called "multiple correlation". On the
other hand, the association between two variables while keeping other variables
constant is called "partial correlation". Another form of association in which the
correlation is defined in relation to time is called autocorrelation. That is, if the
outcomes of a given random variable are considered as a time sequence and the
values of the said variable at a given time are associated with those at another time,
the resulting correlation is called "autocorrelation". In addition, if the hydrologic
data at hand is seasonal such autocorrelation may be periodic.
2.2. Simple Correlation
The simple correlation (coefficient) measures the degree of association between two
variables. If X is a random variable with mean x and variance x2 , and Y is a random
variable with mean y and variance y2 , the simple correlation coefficient of X and
Y is defined by, (10) as:
xy
Cov X , Y
E X Y ........................... (1)
x y
x y x y
Where Cov (X,Y) is the covariance of X and Y. This correlation coefficient xy is a
dimensionless number, which lies between –1 and +1. Positive values of xy
indicate that large (small) values of one variable are associated with large (small)
values of the other variable, while negative values of xy indicate that large values
of one variable are associated with small values of the other. On the other hand,
xy 0 means that X and Y are not associated or there is no correlation between
them.
Given the samples of size N, x1,............, xN and y1 ,.............., yN , an estimator of the
correlation coefficient ̂ xy is given by:
2