Junfeng Yang is Professor of Computer Science, Member of the Data Science Institute, and co-Director of the Software Systems Lab at Columbia University. Yang’s research centers on building reliable, secure, and fast software systems. Today’s software systems are large, complex, and plagued with errors, some of which have caused critical system failures, breaches, and performance degradation. Yang has invented techniques, algorithms, and tools to analyze, test, debug, monitor, and optimize real-world software, including Android, Linux, production systems at Microsoft, machine learning systems, and self-driving platforms, benefiting hundreds of millions of users. His research has resulted in numerous vulnerability patches to real-world systems, practical adoption at the largest technology companies, and press coverage at Scientific American, The Atlantic, The Register, Communications of ACM, and other news outlets. Yang received BS in Computer Science from Tsinghua University and MS and PhD in Computer Science from Stanford University. He won the Sloan Research Fellowship and the Air Force Office of Scientific Research Young Investigator Program Award, both in 2012; the National Science Foundation CAREER award in 2011; the inaugural Rock Star Award of the Association of Chinese Scholars in Computing in 2019; and Best Paper Awards at the USENIX Symposium on Operating System Design and Implementation in 2004, the ACM Symposium on Operating Systems Principles in 2017, and the USENIX Annual Technical Conference in 2021.

报告题目:Debugging Performance Issues in Modern Desktop Applications

摘要: Modern desktop applications involve many asynchronous, concurrent interactions that make performance issues difficult to diagnose. Although prior work has used causal tracing for debugging performance issues in distributed systems, we find that these techniques suffer from high inaccuracies for desktop applications. In this talk, I will present Argus, a fast, effective causal tracing tool for debugging performance anomalies in desktop applications. Argus introduces a novel notion of strong and weak edges to explicitly model and annotate trace graph ambiguities, a new beam-search-based diagnosis algorithm to select the most likely causal paths in the presence of ambiguities, and a new way to compare causal paths across normal and abnormal executions. We have implemented Argus across multiple versions of macOS and evaluated it on 12 infamous spinning pinwheel issues in popular macOS applications. Argus diagnosed the root causes for all issues, 10 of which were previously unknown, some of which have been open for several years. This work won a Best Paper award in USENIX ATC 2021. It is joint with Lingmei Weng (lead PhD student, graduating next academic year), Ryan Peng Huang, and Jason Nieh.


谭光明,研究员、博导、中科院计算技术研究所高性能计算机研究中心主任。国家杰出青年基金获得者,参与了曙光系列高性能计算机包括曙光4000/5000/6000/7000系统研制。发表学术论文100余篇,包括CCF A类论文(TC、SC、PPoPP)和Nature子刊等,曾任IEEE TPDS编委和国际会议(SC、PPoPP)等程序委员。曾获得国家科技进步奖二等奖、卢嘉锡青年人才奖和全国向上向善好青年称号。


摘要: 高性能计算领域的核心命题是关于如何满足应用性能需求,与一般性计算问题相比而言,性能通常是第一优先级考虑的指标。总体上而言,影响性能的诸多因素主要包括:硬件设计(流水线、向量宽度、Cache大小等)、算法模型(复杂度等)、实现方式(编程语言、数据结构、库的版本等)、代码生成(编译器)、系统配置(操作系统的选择等)和执行环境(亲和性选择、资源分配和系统噪音等)。在真实的运行系统中,这些性能因素之间不是独立正交,而是相互影响形成一个非常复杂庞大的优化空间。在单纯以软件工程驱动的高性能计算软件栈设计中,人们为了追求高的生产效率,通过分层模块设计把错综复杂的性能因素“粗暴”地割裂开,在通用硬件性能提升放缓的情况下,所谓的软件“肿胀”导致的性能瓶颈问题就凸显出来。这种性能损失对以性能为第一优先目标的高性能计算而言显得尤为突出,因此,在继高性能计算的硬件工程和软件工程技术系统发展多年之后,本报告试图提倡高性能计算性能工程的研究,以系统发展性能工程技术,应对高性能计算软硬件栈在后摩尔时代的挑战。


Lili Qiu is an Assistant Managing Director at Microsoft Research Asia and a Professor at Computer Science Dept. in UT Austin. She got M.S. and PhD degrees in Computer Science from Cornell University in 1999 and 2001, respectively. After graduation, she spent 2001-2004 as a researcher at System & Networking Group in Microsoft Research Redmond. She joined UT Austin in 2005, and has founded a vibrant research group working on Internet and wireless networks at UT. She is an ACM Fellow and IEEE Fellow. She also got an NSF CAREER award and Google Faculty Research Award, and best paper awards at ACM MobiSys'18 and IEEE ICNP'17. She advised a PhD dissertation that won SIGMOBILE best dissertation award in 2020.

报告题目:Acoustic Sensing and Applications

摘要: Video games, Virtual Reality (VR), Augmented Reality (AR), and Smart appliances (e.g., smart TVs and drones) all call for a new way for users to interact and control them. Motivated by this observation, we have developed a series of novel acoustic sensing technologies by transmitting specifically designed signals or using signals naturally arising from the environments. We further develop a few interesting applications on top of our motion tracking technology such as a follow-me drone and acoustic imaging on mobile phones.


Shan Lu is a Professor in the Department of Computer Science at the University of Chicago. Her research focuses on detecting, diagnosing, and fixing functional and performance bugs in software systems. Shan is an ACM Distinguished Member (2019 class) and an Alfred P. Sloan Research Fellow (2014). Her co-authored papers have won distinguished paper and influential paper awards at ASPLOS, SOSP, OSDI, FAST, ICSE, FSE, CHI, and PLDI. Shan currently serves as the Chair of ACM-SIGOPS, and the Vice Chair of ACM SIG Governing Board Executive Committee. She served as the technical program co-chair for ASPLOS 2022, OSDI 2020, APSys 2018, and USENIX ATC 2015

报告题目: 15 Years of Learning from Mistakes in Building System Software

摘要: Bugs severely threaten the correctness and efficiency of software. With our system software growing its complexity, bugs in system software also evolve, imposing different challenges over the years. In this talk, we look back at our study of concurrency bugs in multi-threaded software, which was done 15 years ago and recently won ASPLOS Influential Paper Award, as well as various bug studies that we conducted over the years about distributed systems, industry cloud systems, database systems, machine learning systems, etc. We discuss the lessons that we have learned, as well as the new challenges faced by today's system building.



报告题目: 处理器安全分支预测器

摘要: 在数字化日益普及的今日,数据中心处理器芯片安全问题愈发重要。尤其是在云端,处理器面临着众多的安全风险。我们以处理器中性能提升关键模块——分支预测器——为切入点,分别从更新策略、内容存储、索引映射三个方面对传统分支预测器设计进行了解构,并提出了一系列安全增强机制,实现了分支预测器的安全重构。




摘要: 数据密集型业务的快速发展驱动DPU成为继CPU和GPU之后的第三大算力,业界围绕DPU进行创新也成为新的热点。华为依托在存储领域多年的技术积累,也基于DPU进行了从硬件和OS到虚拟化、大数据、数据库等场景加速的一些创新实践,旨在通过DPU与存储结合为用户带来数据处理和存储效率的倍数级提升。