I maintain a few datasets to help other researchers and save their efforts and they are mostly used in my papers. I also maintain several software that help people to be more productive. My most starred Github repo helped thousands of people in both academia and industry
CIK to CUSIP Mapping
Provide linking files between CIK and CUSIP using 13G and 13F filings.
USPTO full text database
Provide OCR full text data for pre-1975 USPTO patents. They offer great improvements in quality and coverage than those in Google Patents
Name Matching
Algorithm to match firm names based on string similarities
Replace and Delete (rd)
Extremely fast command line utility to replace and delete strings in text files
Fuzzy Process (fuzzprocess)
Deep-learning approach to find nearest K matches for two sets of names