Papershelf

A list of research papers, articles & blogs I've enjoyed reading

MapReduce: Simplified Data Processing on Large ClustersMaking it easy to run MapReduce on large cluster of distributed machines
Near-duplicate Question DetectionAn algorithm which uses LLMs for near-duplicate question detection
SIEVE is Simpler than LRUA new, simple & primitive caching strategy to improve Cache performanceSupporting Resources: home
Magnet: Push-based Shuffle Service for Large-scale Data ProcessingA spark merging mechanism which merges continuous chunks and stores batches remotely to speed up MapReduce operationsSupporting Resources: spark shuffling
Improving Language Understanding by Generative Pre-TrainingGPT 1 Paper