Pedro Rodriguez


PhD Candidate in
Artificial Intelligence, Machine Learning, and Natural Language Processing

Tips

Over my career I've made many mistakes, occasionally learn from them, sometimes find useful software/tips/resources, and such. I don't expect to remember all or even most of these so I compile everything here so that I have a quick and easy way to reference them on the web; hopefully in doing so it also turns out to be helpful for others as well.

Cheatsheets

Data Formats

  • Unless you have a very good reason and have purely numerical data, never use csv; saying a file is csv format is insufficient information to be able to parse the file
  • Default to json
  • For large json files that are table-like (the root object is an array, and looks like rows), consider JSON lines/jsonl. Large JSON objects can be expensive to parse, and make it difficult to run parallel jobs (eg Apache Spark uses line delimited rows from text files)

Software

Libraries

Python

Tips from Others

Docs

Configs

Wiki

LaTeX

  • What is ~? Non-breaking space, LaTeX will not break lines between alpha and beta in alpha~beta
  • Create PDF version of figures

Debugging