Essay

Ten Years of Study and Research

Notes on imitation learning, scientific curiosity, mentorship, and growth, adapted from a previous interview.

随笔

十年研学路：关于模仿学习、科研与成长的几点感悟

整理自“研途风采”访谈，关于模仿学习、科研、导师与成长的几点感悟。

2026-03-20

2014 年，我考入东北电力大学电气工程及自动化学院，正式开启了自己的研学之旅。从本科到研究生再到博士生，这十年的求学之路上，我获得了很多帮助，也收获了不少经验与感悟。借这个机会，我想把其中一些对我真正有用的认识整理出来，希望能给正在开启大学与研究生活的朋友提供一点参考。

01 模仿学习：做聪明的决策

我的研究课题是强化学习，核心在于设计一个智能体，让它通过与环境交互，不断提升决策水平，最终实现最优目标。沿着这个视角去看，我一直觉得，人生也可以被理解为在不断变化的环境中追求属于自己的“奖励函数”。

中小学阶段的教育环境相对固定，奖励函数也比较简单，而且往往由他人预设，比如考试成绩、升学目标等等。但进入大学或研究生活之后，环境会迅速复杂起来，奖励函数也开始变得模糊。这时，一个人必须主动思考并定义自己的人生目标：是继续求学，探索更多可能性，还是找到真正热爱的事情长期坚持下去？这些都需要认真权衡。

与强化学习不同，我们的人生只有一次，不可能通过大量试错才得出最优策略。但我们可以通过“模仿学习”，从他人的经验和分享中吸收智慧，更高效地认识自己。对于仍然感到迷茫、还没有看清自己所处环境和奖励函数的人，我一直推荐去大量收集信息，认真思考，然后果断决策，并且为结果负责。

02 好奇、求实、创新，做科学峡谷的探险家

在科研生活中，保持对未知世界的好奇心、对真实问题的求索精神，以及勇于创新的冒险心态，是一种既快乐又高效的科研之道。

好奇是求实的起点

好奇心驱使我们探究真相，它是科研探索最重要的源动力之一。回想大三时，AlphaGo 战胜柯洁，引发了强化学习在多人电子竞技游戏中的研究热潮。典型的例子就是 OpenAI Five 用强化学习打 Dota。因为我自己喜欢玩英雄联盟，也自然开始关注相关研究，并主动学习机器学习和 PyTorch。这份好奇心后来成为我博士科研的起点。虽然我的博士课题最终没有用强化学习纵横召唤师峡谷，但早期的兴趣和积累，让我进入课题时更快，也让后续研究顺利得多。

求实是创新的基石

科研的基础，始终是对现有问题和相关领域的深入了解。只有通过大量阅读、调研和数据分析，才有可能找到真正有意义的问题。比如在博士课题中，强化学习和配电网环境交互需要频繁进行潮流计算，而强化学习本身的学习效率又相对较低，往往需要与环境交互成千上万次。这件事直接激发了我开发一个可以显著改进潮流计算速度的配电网强化学习环境的想法，并最终把 RL-ADN 开源出来，供其他研究者继续使用。

创新是一场场探险

科研创新很像探险。每一个新想法都需要经历分析、实验和验证。这条路充满了不确定性，也充满了失败，但正是在这些失败里，我们才逐步逼近真实的问题与答案。

博士期间，我经历过很多次强化学习算法不收敛，或者在一个环境里收敛、换一个环境就不收敛，甚至只是改了一个参数就从收敛变成不收敛。那段时间，我几乎每天都登录 WandB 看实验指标，像侦探一样根据曲线和异常去寻找原因。面对无数次失败，我能做的也只是调整心态，把“屡战屡败”改成“屡败屡战”。在这个过程中，我对算法的理解不断加深，对强化学习这门手艺也越来越熟悉。失败并不可怕，它本来就是通往知识边界的必经之路。

03 导师是博士生科研的重要因素

博士生的科研进展和生活质量，与导师有非常强的相关关系。

在读博期间，我不需要做太多和科研无关的事情，因此有较充分的时间去阅读、思考、探索和试错。这一点在项目压力很重的工科环境里并不常见，因为很多时候，研究生的时间安排与导师的利益是高度相关的。

更重要的是科研训练本身。前期频繁的组会和有针对性的科研训练，就像博士生与导师共同构建的一个反馈调节过程，目标是把博士研究生训练成合格的科研人员。这个过程不仅是在塑造某个专业领域里的知识结构，更重要的是形成一种通用的科研思维，以及解决问题的方式和行为习惯。它不仅包括如何设计研究、设计实验、形成论文，也包括如何协商合作、如何沟通、如何讲演以及如何展示自己。和导师共同构建并完成这样的系统训练，对博士生的科研进展和个人成长都极其重要。

04 生活中的我

如前面所说，好奇心一直是我生活中的主导力量。这不仅是我选择读博的重要动力之一，也深刻影响了我的兴趣爱好。

我喜欢阅读社会调查报告和中国近现代史，从中寻找社会发展的逻辑，以及多方利益博弈下重大历史事件的演变方向。从我的角度看，科研工作者、调查记者和历史学者虽然面对的是不同的时间维度，但本质上都在探索真相。科研工作者面向未来的未知，调查记者关注当下的真实，历史学者则努力还原过去的真相。

科研最吸引我的一点在于，它和现实中的直接利益纠葛相对更少，因此往往也有更大的探索空间。我们可以更纯粹地聚焦问题本身，从迷雾和海量数据里提取信息，再通过逻辑和分析去逼近未来技术可能的方向与解决方式。对我来说，这种以追寻真相为目标的过程始终非常迷人。它不仅满足我的好奇心，也带来持续的分析乐趣。无论是科研、阅读历史，还是理解社会调查，这些事情都让我在探索中获得成就感，也让我对世界的运行方式有了更深的理解。

In 2014, I entered the School of Electrical Engineering and Automation at Northeast Electric Power University and formally began my own journey of study and research. From undergraduate training to master’s work and then doctoral research, the past decade has brought me a great deal of help, as well as many lessons worth keeping. I want to share a few of those lessons here in the hope that they might offer some reference to people who are just beginning university or research life.

01 The Value of Imitation Learning: Make Smarter Decisions

My research field is reinforcement learning. At its core, reinforcement learning designs an agent that improves its decisions through interaction with an environment and eventually reaches an optimal objective. From that perspective, I often feel that life can also be understood as the pursuit of one’s own reward function inside a changing environment.

During primary and secondary education, the environment is relatively fixed, and the reward function is usually simple and externally defined: grades, exams, entrance targets, and so on. But once you enter university or research life, the environment becomes much more complex, and the reward function becomes blurred. At that point, you must actively ask what your own goals are. Do you want to continue studying and explore more possibilities, or do you want to find something you truly love and keep building around it? Those choices deserve real thought.

Unlike reinforcement learning, our lives only happen once. We cannot run massive trial-and-error loops until we discover the optimal policy. But we can still use a form of imitation learning. We can learn from the experiences, reflections, and mistakes of others in order to understand ourselves more efficiently. For people who still feel lost and have not yet figured out their own environment or reward function, my advice is simple: collect information seriously, think carefully, decide firmly, and take responsibility for the outcome.

02 Curiosity, Realism, and Innovation: Explore the Canyon of Science

In research life, keeping curiosity toward the unknown, a realistic commitment to real problems, and a willingness to innovate are what make research both joyful and effective.

Curiosity Is the Starting Point of Real Inquiry

Curiosity is one of the deepest forces behind scientific exploration. I still remember the period when AlphaGo defeated Ke Jie and reinforcement learning triggered a wave of interest across multiplayer electronic games. A representative case was OpenAI Five playing Dota. Because I myself enjoyed playing League of Legends, I naturally became interested in related work and started learning machine learning and PyTorch on my own. That curiosity later became the starting point of my doctoral research. My PhD topic did not literally apply reinforcement learning to Summoner’s Rift, but the interest and early accumulation made it much easier for me to enter the topic and move faster once the real work began.

Realism Is the Foundation of Innovation

Research begins with a deep understanding of existing problems and the relevant field. Only through reading, investigation, and careful data analysis can you identify questions that are truly worth working on. In my doctoral work, reinforcement learning had to interact with distribution-network environments very frequently, which meant repeated power-flow calculations. At the same time, reinforcement learning itself tends to be data-hungry and interaction-heavy. That tension directly led me to the idea of building a distribution-network reinforcement learning environment with much lower computational cost, which eventually became the open-source RL-ADN environment for other researchers to use.

Innovation Is a Series of Expeditions

Scientific innovation feels like exploration. Every new idea has to go through analysis, experimentation, and validation. The road is full of uncertainty and failure, but it is exactly through those failures that we move closer to the truth.

During my PhD, I saw many situations where a reinforcement learning algorithm would fail to converge, or converge in one environment but not another, or stop converging after a single parameter change. I spent many days logging into WandB, reading experiment curves like a detective, and trying to infer what caused the observed behavior. Faced with repeated failures, the only real option was to adjust my mindset and turn repeated defeat into repeated persistence. Over time, that process deepened my understanding of the algorithms and made me much more comfortable with the craft of reinforcement learning. Failure is not the opposite of research. It is one of the normal routes to the boundary of knowledge.

03 Advisors Are a Major Factor in Doctoral Research

The progress and quality of life of a PhD student are strongly correlated with the advisor.

During my doctorate, I did not need to spend much time on work unrelated to research, which meant I had relatively sufficient time for reading, thinking, exploration, and trial and error. That is rare in many engineering environments where projects dominate everything and student time is tightly coupled to advisor incentives.

What matters even more is the training process itself. Frequent meetings in the early stage and targeted research training function like a feedback-regulation system jointly built by the advisor and the doctoral student. The purpose is to train the student into a qualified researcher. This process is not only about building knowledge in a specific domain. More importantly, it is about forming a general research mindset and a durable way of solving problems. That includes how to design research and experiments, how to turn work into publishable papers, and also how to collaborate, negotiate, communicate, present, and make oneself legible to others. That systematic training is extremely important for both research progress and personal growth.

04 Life Beyond Research

As I said earlier, curiosity has long been one of the strongest forces in my life. It was one of the reasons I chose to pursue a doctorate, and it also shapes many of my interests outside research.

I enjoy reading social investigation reports and modern Chinese history, because they reveal the logic of social development and the way major events evolve through competing interests. To me, researchers, investigative journalists, and historians work on different time horizons, but all of them are fundamentally trying to approach the truth. Researchers face the unknown future, journalists try to understand the present, and historians reconstruct the past.

One of the things that attracts me most to research is that it is, at least relatively speaking, less entangled with immediate interest conflicts. That gives it more room for focused inquiry. In research, we can more purely center the problem itself, extract information from fog and large volumes of data, and use logic and analysis to get closer to the future direction of technology and practical solutions. That process has always fascinated me. It satisfies my curiosity, but it also gives me the pleasure of analysis and inference. Whether through research, history, or social inquiry, I keep finding the same reward: a deeper understanding of how the world actually works.

Source Note

Adapted from the WeChat article 【研途风采】代尔夫特理工大学侯胜任：以好奇为炬，照亮通往真实的路 published by 电力系统自动化 on 2025-03-31.

来源说明

整理自电力系统自动化发布的微信文章【研途风采】代尔夫特理工大学侯胜任：以好奇为炬，照亮通往真实的路（2025-03-31）。