2024 AIGC + Education Industry Report: Human-computer Co-nurturing In the early hours of March 19,...
Newsletter#24: Teaching with AI: Understanding the Foundations
Teaching with AI: Understanding the Foundations
In our last two newsletters, we explored how to enhance teaching through prompt engineering and highlighted practical AI tools for higher education.
As AI continues to advance, gaining a clear understanding of its development is becoming essential for educators. A solid grasp of AI’s capabilities—and its current limitations—helps teachers make informed decisions about when and how to use AI in the classroom. For example, while large language models (LLMs) are excellent at generating text and facilitating interactive dialogue, they still fall short in areas such as deep reasoning and factual precision. Recognizing these strengths and weaknesses helps prevent over-reliance and encourages more intentional integration.
By learning about the history of AI and key technologies like machine learning, deep learning, and LLMs, educators can strengthen their digital literacy and feel more confident experimenting with AI-supported teaching practices.
In this issue, we feature 11 landmark research papers that played a pivotal role in the development of AI. These works not only shaped the field but also offer valuable insights into the foundations of the tools we are using today.
We hope this selection sparks your curiosity and supports your journey in understanding and applying AI in your teaching.
1. A Proposal for The Dartmouth Summer Research Project
on Artificial Intelligence(1955)
Historical Significance: The Birth of AI Field of Research
On August 31, 1955, four scientists jointly submitted a proposal to the Rockefeller Foundation, aiming to bring together researchers from various disciplines to explore the possibility of machines simulating human intelligence. This proposal laid the foundation for the 1956 Dartmouth Summer Research Project on Artificial Intelligence, held from June to August, where key issues in the emerging field of AI were discussed intensively.
The proposal chose the term "Artificial Intelligence" to remain neutral—distancing itself from automata theory, cybernetics, and the influence of Norbert Wiener.
The four pioneering scientists behind this historic moment were:
-
John McCarthy – Assistant Professor of Mathematics at Dartmouth College at the time, later awarded the Turing Award in 1971.
-
Marvin Minsky –a young researcher in Mathematics and Neurology at Harvard University, recipient of the Turing Award in 1969.
-
Nathaniel Rochester – Chief architect of IBM’s first commercial computer, the IBM 701.
-
Claude Shannon – Mathematician at Bell Labs, widely regarded as the “Father of Information Theory.”
2. ImageNet Classification with DeepConvolutional Neural
Networks (2012)
Historical Significance: A Foundational Work in Deep Learning — The Introduction of AlexNet
The year 2012 is widely recognized as the onset of the deep learning resurgence. Geoffrey E. Hinton, one of the founding figures of deep learning, and his student Alex Krizhevsky introduced AlexNet, a deep convolutional neural network (CNN) that achieved groundbreaking success in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
Compared to earlier models like LeNet, AlexNet was deeper, larger, and employed stacked convolutional layers to extract hierarchical features, resulting in significantly improved recognition accuracy. Its performance marked a major breakthrough in computer vision and established deep learning as a powerful approach in the field.
AlexNet has since become the foundation for many modern CNN architectures and sparked the widespread application of deep learning in image recognition tasks.
3. A Convolutional Neural Network for Modelling Sentences(2014)
Historical Significance: The First Application of Convolutional Neural Networks (CNNs) to Sentence Modeling in Natural Language Processing (NLP)
This paper introduced a convolutional neural network (CNN) architecture for sentence modeling and proposed the dynamic k-max pooling method, which adaptively selects the most salient features from sentences of varying lengths. This innovation allowed the model to handle variable-length input sequences, making it well-suited for sentence-level NLP tasks.
By extending CNNs—originally developed for image analysis—to the field of natural language processing, the authors not only achieved slight improvements in accuracy but, more importantly, addressed the computational constraints of the time. Unlike recurrent models that often required powerful GPUs, CNNs could be trained efficiently on CPUs, making them a practical solution when GPU resources were scarce.
This work sparked further exploration of CNNs in NLP and paved the way for their use in text classification, machine translation, and semantic matching tasks.
4. Two-Stream Convolutional Network for Action Recognition in Videos(2014)
Historical Significance: The Introduction of Two-Stream CNN — A Breakthrough in Video Action Recognition
This paper proposed the Two-Stream Convolutional Neural Network (CNN) architecture, which processes spatial information (RGB frames) and temporal information (optical flow) separately. This design effectively captures both static and dynamic features in video data.
The Two-Stream CNN introduced a novel approach to video action recognition and significantly improved performance in the field. It had a profound impact on subsequent models that aim to jointly model spatial and temporal dynamics, revolutionizing how video-based behavior recognition tasks are approached.

5. Deep Residual learning for Image Recognition (2015)
Historical Significance: The Introduction of ResNet — One of the Most Cited Papers in Computer Vision
This paper introduced the Residual Network (ResNet), which effectively addressed the vanishing gradient problem in deep neural networks and significantly improved their performance. ResNet achieved a landmark victory in the 2015 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), excelling in both image classification and object detection tasks.
ResNet quickly became a foundational architecture for numerous subsequent models. While it originated in computer vision, its influence rapidly expanded to fields such as speech recognition, natural language processing, and beyond—impacting a wide range of applications across engineering and the sciences.
6. Attention is All You Need(2017)
Historical Significance: One of the Most Influential Papers in Deep Learning — The Introduction of the Transformer Model
This groundbreaking research paper, published by the Google team in 2017, introduced the Transformer model, which is entirely based on the self-attention mechanism and discards traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Unlike RNNs and LSTMs, which process sequences in order, the Transformer allows the model to attend to different parts of the input simultaneously, enabling more efficient and scalable learning.
The introduction of the Transformer fundamentally transformed the field of natural language processing (NLP), becoming the foundation for modern language models such as BERT and GPT. These models have since been widely applied in tasks like text translation, summarization, and generation, setting new benchmarks across the board.
7. Improving Language Understanding by Generative Pre-Training(2018)
Historical Significance: The Introduction of GPT (Generative Pre-Training)
This was the first paper in OpenAI’s influential GPT series and a pioneering work that introduced Generative Pre-Training (GPT) as a method to enhance the generalization capabilities of language models. Built on the Transformer architecture, the GPT model leveraged unsupervised learning to pre-train on large-scale text data, allowing it to learn general language patterns.
The model could then be fine-tuned for specific tasks, demonstrating significant progress in language understanding. GPT was able to generate coherent and contextually appropriate text—even for tasks it had not been explicitly trained on—marking a major advancement in the development of general-purpose language models.
8. BERT: Pre-training of Deep Bidirectional Transformers for language Understanding(2018)
Historical Significance: The Introduction of BERT — Excelling in Context-Dependent Language Understanding
Published by the Google team in 2018, this groundbreaking paper introduced BERT (Bidirectional Encoder Representations from Transformers). The key innovation of BERT lies in its bidirectional pretraining approach. Unlike traditional unidirectional language models, BERT uses Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) as its pretraining objectives, enabling it to consider both the left and right context of each word.
This allows BERT to gain a deeper understanding of word meaning within its full linguistic context. The introduction of BERT marked a major breakthrough in the field of natural language processing, setting new performance benchmarks—particularly in tasks that require strong contextual comprehension.
9. Language Models are Unsupervised Multitask learners(2019)
Historical Significance: An Introduction to GPT-2
This paper served as an introductory publication for GPT-2, released by OpenAI. It presented the capabilities of GPT-2 and demonstrated the remarkable potential of large-scale, unsupervised language models.
10. Language models are Few-Shot Learners(2020)
Historical Significance: An Introduction to GPT-3
This milestone paper, published by OpenAI, introduced GPT-3, a massive language model with 175 billion parameters. GPT-3 demonstrated exceptional performance across a wide range of natural language processing (NLP) tasks with minimal task-specific training.
A key contribution of the paper was the introduction of the concept of few-shot learning, where the model can generalize and perform tasks effectively using only a few examples—rather than relying on large amounts of labeled data. This work highlighted the vast potential of large-scale pre-trained models in handling diverse and complex language tasks with minimal human supervision.
10. Toolformer: Language Models Can Teach Themselves to Use Tools(2022)
Historical Significance: An Introduction to GPT-4 and Toolformer
Toolformer is a novel language model designed to autonomously learn how to use external tools—such as search engines, calculators, and APIs—to enhance its reasoning and decision-making abilities. Through a self-supervised learning approach, the model learns when and how to invoke these tools effectively.
Toolformer significantly improves the interactivity and practical utility of language models, enabling them to handle more complex, real-world tasks. It represents a major step forward in integrating language models with external environments, expanding their potential for real-world applications.
Artificial Intelligence (AI) is a transformative force rapidly reshaping how we live, work—and teach. As its technologies continue to evolve, AI offers powerful opportunities to enhance instruction, streamline administrative work, and personalize student learning. At the same time, it raises important ethical and societal questions around data privacy, algorithmic bias, and equitable access. As educators, it is vital that we thoughtfully integrate AI into our academic practices while engaging in ongoing dialogue about its implications. Faculty have a unique role to play in guiding the responsible use of AI and shaping a future where technology supports inclusive, innovative, and human-centered education.
References
McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955). A proposal for the Dartmouth summer research project on artificial intelligence.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 655–665.
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 27, 568–576.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 4171–4186.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Schick, T., Dwivedi-Yu, J., Ibarz, J., Eisenschlos, J., & Houlsby, N. (2023). Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
Dartmouth College. (n.d.). Artificial intelligence (AI) coined at Dartmouth. Dartmouth College. https://home.dartmouth.edu/about/artificial-intelligence-ai-coined-dartmouth
Wikipedia contributors. (n.d.). Dartmouth workshop. In Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Dartmouth_workshop
Author: Duo (Dolores) Liu
Chief Editor: Yirui (Sandy) Jiang