# paper_list

**Repository Path**: wangyaoyuu/paper_list

## Basic Information

- **Project Name**: paper_list
- **Description**: ML for system
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-12-11
- **Last Updated**: 2023-12-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# paper_list

## 介绍
ML for system

### SC23
- Automated Mapping of Task-Based Programs onto Distributed and Heterogeneous Machines
- Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service
- Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement Learning
    - 评论/对于任务序列的调度非常适合用强化学习
    - 特点/可以模拟任务的调配和完成，而且不需要实际执行，只需要模拟时间序列得到结果
    - 描述/一系列训练神经网络的子任务，要求子任务之间的时间间隔不能太长。
    - 输入/特征使用队列中的统计信息即可，建成多个时刻的时序特征
    - 输出/是否在t时刻提交该任务
    - 网络结构/transformer or MoE，适合时序数据的记录
    - reward/提交任务之后的与前一个任务之间的 等待时间(interuption time) or 重叠时间(overlap time)
    - train/DQN and policy gradient
- Graph3PO: A Temporal Graph Data Processing Method for Latency QoS Guarantee in Object Cloud Storage System 
    - 评论/如果有相关数据和了解需求，可以做的更完善。从graph的拓扑 extend 网络表达数据storage的能力上，例如hot数据之间的attention。
    - 网络结构/将系统和服务器的拓扑关系建模成graph，影响延迟的重要因素就是硬件的拓扑结构，跳数的latency。host+HDD+SSD三层结构。
    - 输出/每个node的输出代表next T time的预测等待时间。
    - 输入/每个node代表存储结构，node的特征是queue队列内的request的数量和平均时间，即代表延迟的时间。
    - 输入维度/异质化硬件的处理速度不同，将时间序列的处理速度体现在每个node的特征维度。
    - 调度/利用网络在每个节点的等待时间预测，在urgent的调度上采用了简单的数学模型即可有很好的效果。
    - train/采用时间队列，周期性的预测和训练，适应request在不同时间段的特点。l1 loss。
- Prodigy: Towards Unsupervised Anomaly Detection in Production HPC Systems


### ASPLOS23
- Heron: Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators
- TLP: A Deep Learning-Based Cost Model for Tensor Program Tuning

### HPCA23
- AutoCAT: Reinforcement Learning for Automated Exploration of Cache-Timing Attacks
    - 评论/序列时序的决策问题，做出一系列决策以达成猜对地址的条件
    - attack去猜victim访问cache的访问规律。利用cache是否命中等条件估计地址。
    - 输入/state 序列中每一步抉择的状态 one-hot encoding。
    - 输出/action attacker access X/victim access _address_/guess (X = _address_)
    - reward/做出guess之后episode结束，根据guess的对错赋予reward。尽可能少的步数，所以对每一步赋予negative reward。
    - 网络结构/transformer，利于处理时序数据。
    - train/PPO

### MICRO21
- Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
    - 描述/硬件cache的prefetch策略，使用模拟器实现，prefetch的memory load影响一个时间段内。
    - 动机/认为prefetch的策略要做到system aware，系统感知带宽的使用情况，否则会出现负优化。 
    - reward/system aware 在这篇文章中特指bandwidth，reward将当前的负载情况考虑进去。
    - state/程序特征，看不懂。control flow 和 data flow。为了控制特征的维数，穷举哪些最有效。
    - action/prefetch A+offset(O), 地址的偏移量。4KB page + 64B cacheline。
    - 网络结构/无
    - train/Q-value Table最朴素的实现方法，没有用到神经网络
## ... to be continued