Over my career I've made many mistakes, occasionally learn from them, sometimes find useful software/tips/resources, and such. I don't expect to remember all or even most of these so I compile everything here so that I have a quick and easy way to reference them on the web; hopefully in doing so it also turns out to be helpful for others as well.
- PDB Cheatsheet from https://github.com/nblock/pdb-cheatsheet
- Pandas Cheatsheet from https://pandas.pydata.org/
- Unless you have a very good reason and have purely numerical data, never use csv; saying a file is csv format is insufficient information to be able to parse the file
- Default to json
- For large json files that are table-like (the root object is an array, and looks like rows), consider JSON lines/jsonl. Large JSON objects can be expensive to parse, and make it difficult to run parallel jobs (eg Apache Spark uses line delimited rows from text files)
- Zotero for organizing research papers
- MLFlow for experiment tracking
- Glances, a better top/htop (be sure to
pip install nvidia-ml-py3for GPU support)
- Plotnine for figures
- draw.io for diagrams
- Apache Spark for "Big Data"
- I use Arch Linux on machines I own.
- bat: cat replacement
- exa: ls replacement
- linuxbrew: package manager when I don't have sudo
Tips from Others
- What is
~? Non-breaking space, LaTeX will not break lines between alpha and beta in
- Create PDF version of figures
- Anaconda pip installations from source packages causing g++ errors like "file format not recognized", rename anaconda's
ld_so that pip uses the system version https://github.com/pytorch/pytorch/issues/16683#issuecomment-459982988