31
CHAPTER
3
Performance of Managed
Futures: Persistence and the
Source of Returns
B. Wade Brorsen and John P. Townsend
M
anaged futures investments are shown to exhibit a small amount of per-
formance persistence. Thus, there do appear to be some differences in
the skills of commodity trading advisors. The funds with the highest returns
used long-term trading systems, charged higher fees, and had fewer dollars
under management.
Returns were negatively correlated with the most recent past returns,
but the sum of all correlations was positive. Consistent with work in behav-
ioral finance, when deciding whether to invest or withdraw funds, investors
put the most weight on the most recent returns. The results suggest that the
source of futures fund returns is exploiting inefficiencies.
INTRODUCTION
There is little evidence from past research that the top performing managed
futures funds can be predicted (Schwager 1996). Past literature has prima-
rily used variations of the methods of Elton, Gruber, and Rentzler (EGR).
Yet EGR’s methods have little power to reject the null hypothesis of no pre-
dictability (Grossman 1987). Using methods with sufficient power to reject
a false null hypothesis, this research seeks to determine whether perform-
ance persists for managed futures advisors. The data used are from public
funds, private funds, and commodity trading advisors (CTAs). Regression
analysis is used to determine whether all funds have the same mean returns.
This is done after adjusting for changes in overall returns and differences in
leverage. Monte Carlo methods are used to determine the power of EGR’s
c03_gregoriou.qxd 7/27/04 11:03 AM Page 31
TABLE 3.1 Descriptive Statistics for the Public, Private, and Combined CTA Data
Sets and Continuous Time Returns
Combined
Statistic Public Funds Private Funds CTAs
Observations 32,420 23,723 57,018
# Funds 577 435 1,071
Percentage returns
Mean 0.31 0.62 1.28
SD 7.68 9.22 10.53
Minimum −232.69 −224.81 −135.48
Maximum 229.73 188.93 239.79
Skewness −2.08 −0.49 1.14
Kurtosis 133.91 40.70 24.34
c03_gregoriou.qxd 7/27/04 11:03 AM Page 32
Performance of Managed Futures 33
in previous literature. The conventional wisdom as to why CTAs have
higher returns is that they incur lower costs. However, CTA returns may be
higher because of selectivity or reporting biases. Selectivity bias is not a
major concern here, because the comparison is among CTAs, not between
CTAs and some other investment. Faff and Hallahan (2001) argue that sur-
vivorship bias is more likely to cause performance reversals than perform-
ance persistence. The data used show considerable kurtosis (see Table 3.1).
However, this kurtosis may be caused by heteroskedasticity (returns of
some funds are more variable than others).
REGRESSION TEST OF PERFORMANCE PERSISTENCE
To measure performance persistence, a model of the stochastic process that
generates returns is required. The process considered is:
(3.1)
where r
it
that some funds and pools have different mean returns than others. This
finding does contrast with previous research, but is not really surprising
given that funds and pools have different costs. Funds and pools have dif-
ferent trading systems, and commodities traded vary widely. The test used
in this study measures long-term performance persistence; in contrast, EGR
measures short-term performance persistence.
rrintT
N
it
it
t
it
=+ + = =
αβ ε
εσ
i
2
11
0
,,, ,,
~(,)
KKand
i
i
c03_gregoriou.qxd 7/27/04 11:03 AM Page 33
34 PERFORMANCE
Only about 2 to 4 percent of the variation in monthly returns across
funds can be explained by differences in individual means. Because the pre-
dictable portion is small, precise methods are needed to find it. Without the
correction for heteroskedasticity, the null hypothesis would not have been
Skewness −0.17 −0.02 0.35
Relative kurtosis 3.84 3.05 2.72
c03_gregoriou.qxd 7/27/04 11:03 AM Page 34
Performance of Managed Futures 35
rescaled residuals have a t-distribution so some kurtosis should remain
even if the data were generated from a normal distribution. This demon-
strates that most of the nonnormality shown in Table 3.1 is due to
heteroskedasticity.
MONTE CARLO STUDY
In their method, EGR ranked funds by their mean return or modified
Sharpe ratio in a first period, and then determined whether the funds that
ranked high in the first period also ranked high in the second period. We
use Monte Carlo simulation to determine the power and size of hypothesis
tests with EGR’s method when data follow the stochastic process given in
equation 3.1. Data were generated by specifying values of α, β, and σ. The
simulation used 1,000 replications and 120 simulated funds. The mean
return over all funds, r¯
t
, is derived from the values of α and β as:
where all sums are from i = 1 to n.
A constant value of α simulates no performance persistence. For the
data sets generated with persistence present, α was generated randomly
based on the mean and variance of β’s in each of the three data sets. To sim-
ulate funds with the same leverage, the β’s were set to a value of 0.5. The
simulation of funds with differing leverage (which provided heteroskedas-
ticity) used β’s with values set to 0.5, 1.0, 1.5, and 2.0.
To match EGR’s assumption of homoskedasticity, data sets were gener-
ated with the standard deviation set at 2. Heteroskedasticity was created by
letting the values of σ be 5, 10, 15, and 20, with one-fourth of the observa-
tions using each value. This allowed us to compare the Spearman correlation
mean returns. The means across all funds in the top-third group and
bottom-third group also were calculated.
To determine if EGR’s test has correct size, it is used with data where
performance persistence does not exist (see Table 3.4). If the size is correct,
the fail-to-reject probability should be 0.95. When heteroskedasticity is
present (data generation methods 2 and 3), the probability of not rejecting
is less than 0.95. The heteroskedasticity may be more extreme in actual
data, so the problem with real data may be even worse than the excess Type
I error found here.
Next, we determine the power of EGR’s test by applying it to data
where performance persistence really exists (see Table 3.5). The closer the
fail-to-reject probability is to zero, the higher is the power. The Spearman
correlation coefficients show some ability to detect persistence when large
TABLE 3.4 EGR Performance Persistence Results from Monte Carlo Generated
Data Sets: No Persistence Present by Restricting a = 1
Data Generation Method
Generated Data Subgroups 1
a
2
b
3
c
Mean returns
top 1/3 1.25 1.25 0.70
middle 1/3 1.25 1.25 0.72
bottom 1/3 1.25 1.22 0.68
top 3 1.25 1.15 0.61
bottom 3 1.26 1.19 0.68
p-values
reject-positive z 0.021 0.041 0.041
Generated Data Subgroups 1
a
2
b
3
c
4
d
Mean returns
top 1/3 3.21 2.77 2.57 1.48
middle 1/3 1.87 2.09 1.85 1.30
bottom 1/3 0.80 1.41 1.15 1.14
top 3 4.93 3.47 3.26 1.68
bottom 3 −1.60 1.14 0.86 1.06
p-values
reject-positive z 1.000 0.827 0.823 0.149
reject-negative z 0.000 0.000 0.000 0.003
fail to reject.000 0.000 0.173 0.177 0.848
test of 2 means
reject-positive 1.00 0.268 0.258 0.043
reject-negative 0.000 0.000 0.000 0.012
fail to reject.000 0.000 0.732 0.742 0.945
a
Data generated using a = N(1.099,4.99); b = .5, 1, 1.5, 2; s = 2.
b
Data generated using a = N(1.099,4.99); b = .5; s = 5, 10, 15, 20.
c
Data generated using a = N(1.099,4.99); b = .5, 1, 1.5, 2; s = 5, 10, 15, 20.
d
Data generated using a = N(1.099,1); b = .5, 1, 1.5, 2; s = 5, 10, 15, 20.
The three-year selection period and three-year trading period show
higher correlations than the four-year selection and one-year trading peri-
ods except for the early years of public funds. There were few funds in these
early years and so their correlations may not be estimated very accurately.
Rankings in the three-year performance period are also less variable than in
the one-year performance period. The higher correlation with longer trad-
ing period suggests that performance persistence continues for a long time.
This fact suggests that investors may want to be slow to change their allo-
cations among managers.
The next question is: Why do the results differ from past research? Actu-
ally, EGR found similar performance persistence, but dismissed it as being
small and statistically insignificant. Our larger sample leads to more power-
ful tests. McCarthy (1995) did find performance persistence, but his results
c03_gregoriou.qxd 7/27/04 11:03 AM Page 38
Performance of Managed Futures 39
are questionable because his sample size was small. McCarthy, Schneeweis,
and Spurgin’s (1997) sample size was likely too small to detect performance
persistence in the mean. Irwin, Krukmeyer, and Zulauf (1992) placed funds
into quintiles. Their approach is difficult to interpret and may have led to
low power. Schwager (1996) found a similar correlation of 0.07 for mean
TABLE 3.6 Summary of Spearman Correlations between Selection
and Performance Periods
Data Set Selection Average Years Years Positive and
Criterion Correlation Positive (%) Significant (%)
Four and one
a
CTA
mean returns 0.118 83 25
a 0.114 83 25
a/s 0.168 100 42
returns. Schwager, however, found a negative correlation for his return/risk
measure. He ranked funds based on return/risk when returns were positive,
but ranked on returns only when returns were negative. This hybrid meas-
ure may have caused the negative correlation. Therefore, past literature is
indeed consistent with a small amount of performance persistence. Perfor-
mance persistence is found here because of the larger sample size and a slight
improvement in methods. As shown in Table 3.6, several years yielded neg-
ative correlations, and many positive correlations were statistically insignif-
icant. Therefore, results over short time periods will be erratic.
The performance persistence could be due to either differences in trad-
ing skills or differences in costs. There is no strong difference in perform-
ance persistence among CTAs, public funds, and private funds.
PERFORMANCE PERSISTENCE
AND CTA CHARACTERISTICS
Because some performance persistence was found, we next try to explain
why it exists. Monthly percentage returns were regressed against CTA char-
acteristics. Only CTA data are used since little data on the characteristics of
public and private funds were available.
Data and Regression Model
Table 3.7 presents the means of the CTA characteristics. The variables listed
were included in the regression along with dummy variables. Dummy vari-
ables were defined for whether a long-term or medium-term trading system
was used. The only variables allowed to change over time were dollars
under management and time in existence.
The data as provided by LaPorte Asset Allocation had missing values
recorded as zero. If commissions, administrative fees, and incentive fees
were all listed as zero, the observations for that CTA were deleted. This
eliminated most but not all of the missing values. If commissions were zero,
the mean of the remaining observations was imputed.
A few times options or interbank percentages were entered only as a
Variable Units Mean SD
Commission % of equity 5.7 4.7
Administrative fee % of equity 2.5 1.5
Incentive fee % of profits 19.9 4.5
Discretion % 27.7 37.9
Non-U.S. % 17.0 26.3
Options % 5.3 15.7
Interbank % 13.9 29.3
Margin % of equity invested 21.8 10.9
Time in existence months 55.0 45.4
First year 87.9 4.9
Dollars under
management ($million) 34.8 131.6
Note: These statistics are calculated using the monthly data and were weighted by
the number of returns in the data set.
c03_gregoriou.qxd 7/27/04 11:03 AM Page 41
42 PERFORMANCE
important because many of the variables do not vary over time. Ignoring
random effects could cause significance levels to be overstated.
Regression of Mean Returns on CTA Characteristics
Table 3.8 presents the regressions of monthly percentage returns against CTA
characteristics. Short-term horizon traders had lower returns than the long-
term and medium-term traders. The coefficient of 0.30 for medium-term
traders means that monthly percentage returns are 0.30 higher for medium-
term traders than for short-term traders. For comparison, CTA monthly
returns averaged 1.28 percent. All three fee variables had positive coefficients.
Two of them (administrative and incentive fee) were statistically significant.
The fee variables represent the most recent fees. This means that CTAs with
larger historical returns charge higher fees. It may also means that CTAs
with superior ability are able to charge a higher price. A 20 percent incentive
means that firms with all trading in options have monthly returns 0.4 per-
centage points lower than a CTA that did not trade options.
Both the time in existence and the year trading began had negative coef-
ficients. The negative sign is at least partly due to selectivity bias. Some
CTAs were added to the database after they began trading. CTAs with poor
performance may not have provided data. This could cause CTAs to have
higher returns in their first years of trading. A negative sign on the first-year
variable suggests that the firms entering the database in more recent years
have lower returns. Thus, selectivity bias may be less in more recent years.
CTA returns also may genuinely erode over time. If CTAs do not
change their trading system over time, others may discover the same ineffi-
ciency through their own testing. Also, the way the CTA trades may be imi-
tated if the CTA tells others about his or her system. CTAs are clearly
concerned about this potential problem; most keep their system secret and
have employees sign no-compete agreements.
The dollars under management have a negative coefficient. The coeffi-
cient implies that for each $1 million under management, returns are
0.00104 percentage points lower. This could be due to increased liquidity
costs from larger trade sizes. Returns would go to zero when a CTA had $1
billion under management.
Following Goetzmann, Ingersoll, and Ross’s (1997) arguments for
hedge funds, managed futures exist because of inefficiencies in the market
and because the CTA either faces capital constraints or is risk averse. By the
very action of trading, the CTA is acting to remove these inefficiencies.
Goetzmann, Ingersoll, and Ross (1997) argue that incentive fees exist partly
to keep a manager from accepting too much investment. Dollars under
management is a crude measure of excessive investment. Funds that trade
more markets or more systems or trade less intensively presumably could
handle more investment without decreasing returns.
Regression of the Absolute Value of Residuals
Non-U.S. −0.013* −2.39
Options −0.011 −1.30
Interbank −0.008 −1.02
Margin 0.092* 7.21
Time in existence −0.029* −10.45
First year −0.260* −5.34
Dollars under management −0.001 −0.78
F-test for commodities traded 1.13
F-test for time 7.74
*
F-test for homoskedasticity 11.96
*
Note: The absolute value of residuals is a measure of riskiness.
*
significant at the 5 percent level.
c03_gregoriou.qxd 7/27/04 11:03 AM Page 44
Performance of Managed Futures 45
for time. Ordinary least squares and random effects for time yielded similar
results. Random or fixed effects for CTAs are not included because a Monte
Carlo study showed that such methods yielded tests with incorrect size.
As shown in Table 3.10 there are cycles in CTA and fund returns. CTAs
tend to do well relative to other CTAs every other year. The sum of the three
coefficients is positive, which confirms the previous results regarding a
small amount of performance persistence. The negative coefficient on
returns during the first lagged year supports Schwager’s arguments that
CTA/fund returns are negatively correlated in the short run.
More risk, as measured by historical standard deviation, leads to higher
returns for CTAs. Since CTAs are profitable, CTAs with higher leverage
should make higher returns and have more risk. In contrast, both public
and private fund returns are negatively related to risk. Thus, risk may dif-
most recent three months were separated, and a dummy variable was added
for positive returns.
The results in Table 3.11 show that investment and disinvestment are a
function of lagged returns. Only returns in the most recent two years were
significantly related. The disinvestment due to negative returns is greater
than the investment that occurs with positive returns for the most recent
two months. This is an indication of some asymmetry. There is no asym-
metry for lags greater than three months.
TABLE 3.11 Regression of Monthly Returns and New Money
against Various Functions of Lagged Returns
Variable Monthly Returns New Money
a
1 month ago returns 0.001 0.155*
(0.04) (5.94)
1 month ago gains 0.026 −0.107
(1.24) (−2.83)
2 months ago returns −0.083* 0.148*
(−5.95) (5.72)
2 months ago gains 0.064* −0.082
(3.14) (−2.12)
3 months ago returns −0.058* 0.087*
(4.16) (3.60)
3 months ago gains −0.093* 0.001
(4.55) (0.03)
Average returns 4–12 months −0.010 0.550*
(−0.48) (13.04)
Average returns 13–24 months 0.134* 0.198*
(6.12) (4.61)
Average returns 25–36 months 0.080 0.055
(4.06) (1.32)
CONCLUSION
This research finds a small amount of performance persistence in managed
futures. Performance persistence could exist due to differences in either cost
or in manager skill. Our results favor skill as the explanation, because
returns were positively correlated with cost. A regression model was esti-
mated including the average fund return as a regressor. The regression
model indicated some statistically significant performance persistence. The
performance persistence is small relative to the variation in the data (only 2
to 4 percent of the total variation), but large relative to the mean.
The regression method was expected to be the method with the highest
power. Monte Carlo simulations showed that the methods used in past
research often could not reject false null hypotheses and would reject true
null hypotheses too often.
Out-of-sample tests confirmed the regression results. There is some per-
formance persistence, but it is small relative to the noise in the data. A
return/risk measure showed more persistence than either of the return
Performance of Managed Futures 47
c03_gregoriou.qxd 7/27/04 11:03 AM Page 47
measures. Although past data can be used to rank funds, precise methods
and long time periods are needed to provide accurate rankings.
CTAs using short-term trading systems had lower returns than CTAs
with longer trading horizons. CTAs with higher historical returns are now
charging higher fees. CTA returns decreased over time and more recent
funds have lower returns. At least part of this trend is likely survivorship
bias. As dollars under management increased, CTA returns decreased. The
finding of fund returns decreasing over time (and as dollars invested
increase) suggests that funds exist to exploit inefficiencies.
The dynamics of returns showed small negative correlations for returns
in the short run, especially for losses. The net effect over three years is pos-
itive, which is consistent with a small amount of performance persistence.