The Trouble with State-by-State Analyses of H-1B

A number of researchers on H-1B and related issues rely on state-by-state, or city-by-city, comparisons.  Notable in the genre are my UCD colleague Giovanni Peri, HBS’ Bill Kerr (and his coauthors) and Madeline Zavodny.  The latter, who wrote on H-1B as a member of the Dallas and Atlanta Feds, is currently a professor at Agnes Scott College.

A couple of weeks ago, R. Davis, a Silicon Valley software developer, contacted me regarding Prof. Zavodny’s  2011 research, sponsored by industry groups.   She found:

The data comparing employment among the fifty states and the District of Columbia show that from 2000 to 2007, an additional 100 foreign-born workers in STEM fields with advanced degrees from US universities is associated with an additional 262 jobs among US natives.

Skeptical of those findings, Davis asked my opinion of the study, and exactly how Zavodny had done the analysis.  I’d been critical of the research in the past, but suggested that Davis write to Zavodny and interact with her.  He did so, and she replied instantly, sending him her data and Stata code.   This showed real class on her part, as most researchers would not send their code and data, unless required by law.

Davis set to work.  Stata is an expensive commercial product, but I steered him to the R language, which is both free and very high-quality.  R is the lingua franca of the statistics community, and I’m quite active in the R world.  Davis has now posted his analysis of the Zavodny research, replicating her numbers but also uncovering a fascinating story underlying the data, which I’ll explain shortly.

But first, I wish to emphasize quite strongly that I’ve never thought highly of the “H-1B creates x jobs” kind of analysis, as it suffers from a huge issue of causality.  Zavodny is careful to use the phrase “is associated with,” which is a lot better than Peri’s repeated claims of finding a “causal” relation in his April 2014 report, but let’s be real, folks:  The industry paid for Zavodny’s work, and are surely representing it as causal as they spread it around Capitol Hill.

Association is NOT causality.  Any decent undergraduate who’s worked with data knows that.  In Zavodny’s case, for instance, what would happen if employers were to hire 100 additional U.S. citizens and permanent residents instead of H-1Bs?  (See qualifier coming up.)  Would they not “cause” 262 jobs to be created?  Indeed, given the poorer average quality of the H-1Bs, wouldn’t hiring 100 more Americans produce MORE than 262 new jobs?  (The industry lobbyists, of course, claim there aren’t 100 more Americans available, but research by Salzman, Lowell, Kuehn, Costa,Teitelbaum and so on has pretty much laid those claims to rest, as even some pro-industry economists seem to concede.)   Actually, this was one of my major criticisms of a paper by the Kerrs and Wm. Lincoln.

Worse, region-by-region analyses are notorious for being unreliable and misleading.  For example, there have been numerous studies on capital punishment, both pro and con, based on comparing states that do and do not have capital punishment., in terms of murder rates and so on.  They can’t all be correct.

The other point I wish to make before turning to the Davis analysis is her use of the terms foreign-born and native-born.  In addition to objecting before to the industry lobbyists’ calculated, labored use of the term foreign-born instead of foreign, such an analysis is highly misleading.  There are many STEM students who are foreign-born but are either naturalized U.S. citizens or permanent residents.  So a lot of STEM workers in her “foreign-born” category are actually Americans, and were never H-1Bs or foreign university students.  Zavodny does not make this clear (and likely is unaware of it), and while she has a separate number for H-1Bs (183 instead of 262), again we all know that on the Hill and in the press, people will take “foreign-born” to mean “H-1B.”  Indeed, this is basically the thrust of Zavodny’s Recommendation 3:

Recommendation 1: Prioritize immigration by workers in STEM fields who hold advanced degrees from US institutions.

Well, then, what about the Davis analysis?  I was floored by his figure titled, “Foreign STEM Workers, 2000-2007.”  Look at the states with big H-1B usage, such as California, New York and New Jersey.  The data are basically flat!  Within states, an increase in the number of foreign-born STEM workers with advanced degrees is NOT associated with a trend of increasing STEM employment for natives.  On the contrary, Davis finds that the foreign-born are replacing the natives, something that even Giovanni has written (which may come as a surprise to those who cite his work).

So, basically we have a situation in which, within groups, the graph of mean Y vs. X is flat, yet after aggregation it appears that increases in X are associated with increases in mean Y — Simpson’s Paradox.   In other words, Davis has uncovered a fundamental flaw in Zavodny’s work, which may well apply upon closer inspection to other region-by-region research on H-1B and related issues.

One more point:  While she was in the immigration neighborhood, Zavodny threw in an analysis of the famous “They pay more in taxes than they take in services” claim so popular among advocates of expansive immigration policies:

Highly educated immigrants pay far more in taxes than they receive in benefits. In 2009, the average foreign-born adult with an advanced degree paid over $22,500 in federal, state, and Federal Insurance Contributions Act (FICA, or Social Security and Medicare) taxes, while their families received benefits one-tenth that size through government transfer programs like cash welfare, unemployment benefits, and Medicaid.

To begin with, the whole “net fiscal gain/loss” issue is a can of worms.  There are so many effects, effects of effects and so on, that it really is an impossible question to answer.  I wish Zavodny had not tried to do so.

But now that she has, let’s take a closer look.  First, the obvious problem — she hasn’t factored in the LOST tax revenue resulting from H-1B and related programs.  Cheaper workers pay less in taxes (some actually pay NO income taxes, due to U.S. tax treaties with their home countries); a glut of workers brings down overall wages, again reducing tax revenue; and the displaced American STEM workers are generally making less (after being forced to change fields) than they used to before displacement, and thus making smaller tax contributions as well.

But less obvious is that a large number of immigrant STEM workers consider one of the major benefits of naturalizing the ability to sponsor their elderly parents for immigration, and later put them on welfare — cash payments, Medicaid, subsizied senior housing and so on.  In Silicon Valley, this is absolutely standard among Chinese and Indian immigrants.  I and others have quantified this, such as in my 1996 Senate testimony.

But that’s a side issue.  I recommend that everyone read Davis’ analysis.  In the future, every time you hear about a state-by-state or city-by-city analysis of the wondrous benefits of H-1B, keep that Davis figure in mind.


11 thoughts on “The Trouble with State-by-State Analyses of H-1B

  1. No, hiring 100 Americans would not produce more than 262 jobs because THERE IS NO CAUSALITY, and hiring 100 H-1Bs did not cause the production of 262 jobs, either.

    How about this, the rising of the sun 365 times “caused” the production of 262 jobs.


    • No causality anytime, anywhere? That would put us statisticians out of business. 🙂

      What I tell students and consulting clients is that one must look at the big picture. How can the H-1Bs be such job creators, for example, if their average quality is lower than that of the Americans? And in that light, if there also is no shortage, what could make H-1Bs so good at creating new jobs? Is it because an employer can hire more of them, for a given price? Are they really just “widgets,” then?

      There are no easy answers to any of these questions. But after putting all the studies and all the qualitative sides together, an analyst will often reach some kind of conclusion involving causality. But Zavodny’s analysis doesn’t really address these questions.


      • I found the wiki page at a good introduction to the question of correlation versus causation. In any case, I’ve run across something over the years that I’ve wondered if there is a theorem for. I’ve seen scores of series that generally go up or down over the period of time being studied. If any of these series are compared via a regression, they will likely prove to have a strong correlation, positive or negative. I have also seen an occassional pair of series which seem to zig and zag in unison or with a slight delay. Visually, the latter case seems much more suggestive of causation. I’ve often wondered if there is an official name or theorem for this phenomena. Of course, I know that even in this case, the correlation could be caused by one of the other possibilities described at the wiki link.

        Anyhow, I agree that one needs to look at the big picture. With the Zavodny data, I found it helpful to look at the data in different formats to look for patterns or flaws in the analysis. It also seems important to combine that with some sort of theory that makes logical sense. Analyzing the data can help one avoid accepting a flawed theory. But analyzing the theory and likewise help one avoid accepting flawed data analysis.


    • Yes, we need to implement policies that create more suns and/or cause them to rise faster! On that topic, I noticed one other item in the study which seemed like a possible problem but I don’t know if it has a name. The problem can arise when you compare a very small group to a large group. According to the first table in my analysis and Table 2 of the study, a ten percent increase of such foreign-born workers was associated with a minuscule increase of just 0.004 of one percent in the “native employment rate” (via a regression with very low correlation). The fact that there are so many more native-born workers than this small set of foreign-born workers causes that tiny increase to imply that each of these workers creates 2.62 jobs! Using the same logic, if one member of my extended family had immigrated to the U.S. each year from 2000 to 2007, I could conclude that each one of them had created thousands of jobs!


      • Generally I would expect to see an association between native and foreign born increased employment in the same way I would expect to see an association between tall people and short people or fat people and thin people employment outcomes but it does not imply that one caused the other.


  2. “Prioritize immigration by workers in STEM fields who hold advanced degrees from US institutions.”

    Well, on the plus side, it is a gesture apparently intended to be in the direction of talent, knowledge, creativity, industriousness, productivity, being “best” or “bright”, or ethical… but still way off the mark.

    Knowing the ropes and being tolerant of the peculiar hoops of academia (and seemingly more perverse by the year) has little to do with being a great and ethical software product developer, sys admin, network admin, data-base analyst or designer, web-weaver… which involve different talents, knowledge, etc.

    Did R.Davis ask M.Zavodny for clarification on those zero values and how she handled them?


    • I haven’t followed up with M. Zavodny as I just found this a few days ago and am still studying the data. I wanted to have a clearer set of questions that I cannot resolve before following up. In any case, I believe that I did find the reason for the zero values. I’ve just posted the following just before the third graph at :

      In fact, Zavodny’s execution file contains the following, starting at line 807:

      * replace missings with 0s
      for var pop_edus_coll pop_nedus_coll pop_edus_grad pop_nedus_grad pop_edus_stem_coll pop_nedus_stem_coll pop_edus_stem_grad pop_nedus_stem_grad: replace X=0 if X==.
      for var emp_edus_coll emp_nedus_coll emp_edus_grad emp_nedus_grad emp_edus_stem_coll emp_nedus_stem_coll emp_edus_stem_grad emp_nedus_stem_grad: replace X=0 if X==.

      Hence, the code is explicitly setting missing values to zero. This might have been because Stata could not perform various operations on zero values. In any case, a missing value should be treated as such and dropped, not changed to some convenient value.


  3. This evening reports are that we may get up to another one million H-1B Visas cobbled into the Amnesty / green card uniltaeral expansion by our President. Mr. Zuckerberg must be very happy.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s