I maintain a few datasets to help other researchers and save their efforts and they are mostly used in my papers. I also maintain several software that help people to be more productive. My most starred Github repo helped thousands of people in both academia and industry

CIK to CUSIP Mapping

Provide linking files between CIK and CUSIP using 13G and 13F filings.

USPTO full text database

Provide OCR full text data for pre-1975 USPTO patents. They offer great improvements in quality and coverage than those in Google Patents

Name Matching

Algorithm to match firm names based on string similarities

Replace and Delete (rd)

Extremely fast command line utility to replace and delete strings in text files

Fuzzy Process (fuzzprocess)

Deep-learning approach to find nearest K matches for two sets of names