

_P_r_o_j_e_c_t_i_o_n _P_u_r_s_u_i_t _R_e_g_r_e_s_s_i_o_n

     ppr(formula, data=sys.parent(), weights, subset,
         na.action, contrasts=NULL, ww=rep(1,q),
         nterms, max.terms=nterms, optlevel=2,
         sm.method=c("supsmu", "spline", "gcvspline"),
         bass=0, span=0, df=5, gcvpen=1)
     ppr(x, y, weights=rep(1,n), ww=rep(1,q), nterms,
         max.terms=nterms, optlevel=2,
         sm.method=c("supsmu", "spline", "gcvspline"),
         bass=0, span=0, df=5, gcvpen=1)

_A_r_g_u_m_e_n_t_s:

 formula: a regression formula specifying one or more
          response variables and the explanatory variables.

       x: matrix of explanatory variables.  Rows represent
          observations, and columns represent variables.
          Missing values are not accepted.

  nterms: number of terms to include in the final model.

    data: Data frame from which variables specified in `for-
          mula' are preferentially to be taken.

 weights: a vector of weights for each case.

      ww: a vector of weights for each response, so the fit
          criterion is the sum over case `i' and responses
          `j' of `w_i ww_j (y_ij - fit_ij)^2' divided by the
          sum of `w_i'.

  subset: An index vector specifying the cases to be used in
          the training sample.  (NOTE: If given, this argu-
          ment must be named.)

na.action: A function to specify the action to be taken if
          `NA's are found. The default action is for the
          procedure to fail.  An alternative is `na.omit',
          which leads to rejection of cases with missing
          values on any required variable.  (NOTE: If given,
          this argument must be named.)

contrasts: the contrasts to be used when any factor explana-
          tory variables are coded.

max.terms: maximum number of terms to choose from when
          building the model.

optlevel: integer from 0 to 3 which determines the through-
          ness of an optimization routine in the SMART pro-
          gram. See the DETAILS section.

sm.method: the method used for smoothing the ridge func-
          tions.  The default is to use Friedman's super
          smoother `supsmu'.  The alternatives are to use
          the smoothing spline code underlying
          `smooth.spline', either with a specified
          (equivalent) degrees of freedom for each ridge
          functions, or to allow the smoothness to be chosen
          by GCV.

    bass: super smoother bass tone control used with
          automatic span selection (see `supsmu'); the range
          of values is 0 to 10, with larger values resulting
          in increased smoothing.

    span: super smoother span control (see `supsmu').  The
          default, `0', results in automatic span selection
          by local cross validation. `span' can also take a
          value in `(0, 1]'.

      df: if `sm.method' is `"spline"' specifies the smooth-
          ness of each ridge term via the requested
          equivalent degrees of freedom.

  gcvpen: if `sm.method' is `"gcvspline"' this is the
          penalty used in the GCV selection for each degree
          of freedom used.

_D_e_s_c_r_i_p_t_i_o_n:

     The basic method is given by Friedman (1984), and is
     essentially the same code used by `ppreg'.  The answers
     will be very similar on a givn machine, but this code
     is extremely sensitive to the compiler used.  The
     differences are the ability to use spline smoothers and
     the interface which should be much easier to use.

     The algorithm first adds up to `max.terms' ridge terms
     one at a time; it will use less if it is unable to find
     a term to add that makes sufficient difference.  It
     then removes the least "important" term at each step
     until `nterm' terms are left.  The levels of optimiza-
     tion differ in how thoroughly the models are refitted
     during this process.  At level 0 the existing ridge
     terms are not refitted. At level 1 the projection
     directions are not refitted, but the ridge functions
     and the regression coefficients are.  Levels 2 and 3
     refit all the terms and are equivalent for one
     response; level 3 is more careful to re-balance the
     contributions from each regressor at each step and so
     is a little less likely to converge to a saddle point
     of the sum of squares criterion.

_V_a_l_u_e:

     A list with the following components, many of which are
     for use by the method functions.

    call: the matched call

       p: the number of explanatory variables (after any
          coding)

       q: the number of response variables

      ml: the argument `max.terms'

     gof: the overall residual (weighted) sum of squares for
          the selected model

    gofn: the overall residual (weighted) sum of squares
          against the number of terms, up to `max.terms'.
          Will be invalid (and zero) for less than `nterms'.

      df: the argument `df'

     edf: if `sm.method' is `"spline"' or `"gcvspline"' the
          equivalent number of degrees of freedom for each
          ridge term used.

  xnames: the names of the explanatory variables

  ynames: the names of the response variables

   alpha: a matrix of the projection directions, with a
          column for each ridge term

    beta: a matrix of the coefficients applied for each
          response to the ridge terms: the rows are the
          responses and the columns the ridge terms

      yb: the weighted means of each response

      ys: the overall scale factor used: internally the
          responses are divided by `ys' to have unit total
          weighted sum of squares.

fitted.values: the fitted values, as a matrix if `q > 1'

residuals: the residuals, as a matrix if `q > 1'

    smod: internal work array, which includes the ridge
          functions evaluated at the training set points.

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J. H. and Stuetzle, W. (1981) Projection pur-
     suit regression. Journal of the American Statistical
     Association 76, 817-823.

     Friedman, J. H. (1984) SMART User's Guide.  Laboratory
     for Computational Statistics, Stanford University
     Technical Report No. 1.

_S_e_e _A_l_s_o:

     `plot.ppr',  `ppreg',  `supsmu',  `smooth.spline'

_E_x_a_m_p_l_e_s:

     # Note: your numerical values may differ
     data(rock)
     attach(rock)
     area1 <- area/10000; peri1 <- peri/10000
     rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
                     data=rock, nterms=2, max.terms=5)
     rock.ppr
     # Call:
     # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
     #     nterms = 2, max.terms = 5)
     #
     # Goodness of fit:
     #  2 terms  3 terms  4 terms  5 terms
     # 8.737806 5.289517 4.745799 4.490378

     summary(rock.ppr)
     # Call:
     # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
     #     nterms = 2, max.terms = 5)
     #
     # Goodness of fit:
     #  2 terms  3 terms  4 terms  5 terms
     # 8.737806 5.289517 4.745799 4.490378
     #
     # Projection direction vectors:
     #       term 1      term 2
     # area1  0.34357179  0.37071027
     # peri1 -0.93781471 -0.61923542
     # shape  0.04961846  0.69218595
     #
     # Coefficients of ridge terms:
     #    term 1    term 2
     # 1.6079271 0.5460971

     par(mfrow=c(1,2), pty="s")
     plot(rock.ppr)
     plot(update(rock.ppr, bass=5))
     plot(update(rock.ppr, sm.method="gcv", gcvpen=2))

