# An Efficient Algorithm for Solving Large Fixed Effects OLS Problems with Clustered Standard Error Estimation

Play An Efficient Algorithm for Solving Large Fixed Effects OLS Problems with Clustered Standard Error Estimation

## Description

useR!2017: An Efficient Algorithm for Solving Large...

title: An Efficient Algorithm for Solving Large Fixed Effects OLS Problems with Clustered Standard Error Estimation author: | | Thomas Balmat and Jerome Reiter | | Duke University

Large fixed effects regression problems, involving order 107 observations and 103 effects levels, present special computational challenges but, also, a special performance opportunity because of the large proportion of entries in the expanded design matrix (fixed effect levels translated from single columns into dichotomous indicator columns, one for each level) that are zero. For many problems, the proportion of zero entries is above 0.99995, which would be considered sparse. In this presentation, we demonstrate an efficient method for solving large, sparse fixed effects OLS problems without creation of the expanded design matrix and avoiding computations involving zero-level effects. This leads to minimal memory usage and optimal execution time. A feature, often desired in social science applications, is to estimate parameter standard errors clustered about a key identifier, such as employee ID. For large problems, with ID counts in the millions, this presents a significant computational challenge. We present a sparse matrix indexing algorithm that produces clustered standard error estimates that, for large fixed effects problems, is many times more efficient than standard "sandwich" matrix operations.

**Keywords**: large data least squares, fixed effects estimation, clustered standard error estimation, sparse matrix methods, high performance computingLarge fixed effects regression problems, involving order 107 observations and 103 effects levels, present special computational challenges but, also, a special performance opportunity because of the large proportion of entries in the expanded design matrix (fixed effect levels translated from single columns into dichotomous indicator columns, one for each level) that are zero. For many problems, the proportion of zero entries is above 0.99995, which would be considered sparse. In this presentation, we demonstrate an efficient method for solving large, sparse fixed effects OLS problems without creation of the expanded design matrix and avoiding computations involving zero-level effects. This leads to minimal memory usage and optimal execution time. A feature, often desired in social science applications, is to estimate parameter standard errors clustered about a key identifier, such as employee ID. For large problems, with ID counts in the millions, this presents a significant computational challenge. We present a sparse matrix indexing algorithm that produces clustered standard error estimates that, for large fixed effects problems, is many times more efficient than standard "sandwich" matrix operations.

### Day:

2## Download

### Download this episode

- MP3 (16.9 MB)
- Low Quality MP4 (17.8 MB)
- Mid Quality MP4 (29.7 MB)

## Event Homepage

useR! International R User 2017 Conference## More episodes in this series

### Efficient R Programming II

rss