Predicting Bad Patents: Employing Machine Learning to Predict Post-Grant Review Outcomes for US Patents

David Winer

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2017-60

May 11, 2017

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-60.pdf

As the number of patents filed with the US Patent Office has ballooned over the last two decades, the need for more powerful patent analytics tools has grown stronger. In 2012, the US Federal Government’s America Invents Act (AIA) put into place a new post-grant review process by which any member of the public could challenge an existing patent through the Patent Trials and Appeal Board (PTAB). Our capstone team developed a tool to predict outcomes for this post-grant review process. We developed algorithms to predict two major outcomes: whether a case brought by a member of the public will be accepted by the Patent Trials and Appeal Board and, once that case is accepted, whether the relevant patent will be invalidated by the Board.

In this report, I focus on the former algorithm—acceptance vs. denial prediction. To predict case acceptance/denial we use natural language processing (NLP) techniques to convert each litigated patent document into thousands of numeric features. Upon combining these text-based features with patent metadata, we used two primary machine learning algorithms to attempt to classify these documents based on their case acceptance/denial outcome: support vector classification and random forests. In this report, I focus both on the efforts we went through to wrangle the data as well as the hyperparameters we tuned across these two algorithms. We found that we were able to achieve performant algorithms that exhibited classification accuracy slightly better than the base rate data skew, although further room for improvement exists. As the post-grant review process matures, there will be further opportunity to gather more case data, refine the tools we have built over the past year, and increase the confidence associated with post-grant review analytics.

Advisors: Lee Fleming

BibTeX citation:

@mastersthesis{Winer:EECS-2017-60,
    Author= {Winer, David},
    Title= {Predicting Bad Patents: Employing Machine Learning to Predict Post-Grant Review Outcomes for US Patents},
    School= {EECS Department, University of California, Berkeley},
    Year= {2017},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-60.html},
    Number= {UCB/EECS-2017-60},
    Abstract= {As the number of patents filed with the US Patent Office has ballooned over the last two decades, the need for more powerful patent analytics tools has grown stronger. In 2012, the US Federal Government’s America Invents Act (AIA) put into place a new post-grant review process by which any member of the public could challenge an existing patent through the Patent Trials and Appeal Board (PTAB). Our capstone team developed a tool to predict outcomes for this post-grant review process. We developed algorithms to predict two major outcomes: whether a case brought by a member of the public will be accepted by the Patent Trials and Appeal Board and, once that case is accepted, whether the relevant patent will be invalidated by the Board. 

In this report, I focus on the former algorithm—acceptance vs. denial prediction. To predict case acceptance/denial we use natural language processing (NLP) techniques to convert each litigated patent document into thousands of numeric features. Upon combining these text-based features with patent metadata, we used two primary machine learning algorithms to attempt to classify these documents based on their case acceptance/denial outcome: support vector classification and random forests. In this report, I focus both on the efforts we went through to wrangle the data as well as the hyperparameters we tuned across these two algorithms. We found that we were able to achieve performant algorithms that exhibited classification accuracy slightly better than the base rate data skew, although further room for improvement exists. As the post-grant review process matures, there will be further opportunity to gather more case data, refine the tools we have built over the past year, and increase the confidence associated with post-grant review analytics.},
}

EndNote citation:

%0 Thesis
%A Winer, David 
%T Predicting Bad Patents: Employing Machine Learning to Predict Post-Grant Review Outcomes for US Patents
%I EECS Department, University of California, Berkeley
%D 2017
%8 May 11
%@ UCB/EECS-2017-60
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-60.html
%F Winer:EECS-2017-60