The basis for machine learning systems

In recent years, the best-performing systems in artificial-intelligence research have come courtesy of neural networks, which look for patterns in training data that yield useful predictions or classifications. A neural net might, for instance, be trained to recognize certain objects in digital images or to infer the topics of texts.

But neural nets are black boxes. After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it’s sometimes possible to automate experiments that determine which visual features a neural net is responding to. But text-processing systems tend to be more opaque.

At the Association for Computational Linguistics’ Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.

“In real-world applications, sometimes people really want to know why the model makes the predictions it does,” says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.”

“It’s not only the medical domain,” adds Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and Lei’s thesis advisor. “It’s in any domain where the cost of making the wrong prediction is very high. You need to justify why you did it.”

“There’s a broader aspect to this work, as well,” says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and the third coauthor on the paper. “You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that’s trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model.”

Virtual brains

Neural networks are so called because they mimic — approximately — the structure of the brain. They are composed of a large number of processing nodes that, like individual neurons, are capable of only very simple computations but are connected to each other in dense networks.

In a process referred to as “deep learning,” training data is fed to a network’s input nodes, which modify it and feed it to other nodes, which modify it and feed it to still other nodes, and so on. The values stored in the network’s output nodes are then correlated with the classification category that the network is trying to learn — such as the objects in an image, or the topic of an essay.

Over the course of the network’s training, the operations performed by the individual nodes are continuously modified to yield consistently good results across the whole set of training examples. By the end of the process, the computer scientists who programmed the network often have no idea what the nodes’ settings are. Even if they do, it can be very hard to translate that low-level information back into an intelligible description of the system’s decision-making process.

In the new paper, Lei, Barzilay, and Jaakkola specifically address neural nets trained on textual data. To enable interpretation of a neural net’s decisions, the CSAIL researchers divide the net into two modules. The first module extracts segments of text from the training data, and the segments are scored according to their length and their coherence: The shorter the segment, and the more of it that is drawn from strings of consecutive words, the higher its score.

The segments selected by the first module are then passed to the second module, which performs the prediction or classification task. The modules are trained together, and the goal of training is to maximize both the score of the extracted segments and the accuracy of prediction or classification.

One of the data sets on which the researchers tested their system is a group of reviews from a website where users evaluate different beers. The data set includes the raw text of the reviews and the corresponding ratings, using a five-star system, on each of three attributes: aroma, palate, and appearance.

Breakthrough memory management scheme

A year ago, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory unveiled a fundamentally new way of managing memory on computer chips, one that would use circuit space much more efficiently as chips continue to comprise more and more cores, or processing units. In chips with hundreds of cores, the researchers’ scheme could free up somewhere between 15 and 25 percent of on-chip memory, enabling much more efficient computation.

Their scheme, however, assumed a certain type of computational behavior that most modern chips do not, in fact, enforce. Last week, at the International Conference on Parallel Architectures and Compilation Techniques — the same conference where they first reported their scheme — the researchers presented an updated version that’s more consistent with existing chip designs and has a few additional improvements.

The essential challenge posed by multicore chips is that they execute instructions in parallel, while in a traditional computer program, instructions are written in sequence. Computer scientists are constantly working on ways to make parallelization easier for computer programmers.

The initial version of the MIT researchers’ scheme, called Tardis, enforced a standard called sequential consistency. Suppose that different parts of a program contain the sequences of instructions ABC and XYZ. When the program is parallelized, A, B, and C get assigned to core 1; X, Y, and Z to core 2.

Sequential consistency doesn’t enforce any relationship between the relative execution times of instructions assigned to different cores. It doesn’t guarantee that core 2 will complete its first instruction — X — before core 1 moves onto its second — B. It doesn’t even guarantee that core 2 will begin executing its first instruction — X — before core 1 completes its last one — C. All it guarantees is that, on core 1, A will execute before B and B before C; and on core 2, X will execute before Y and Y before Z.

The first author on the new paper is Xiangyao Yu, a graduate student in electrical engineering and computer science. He is joined by his thesis advisor and co-author on the earlier paper, Srini Devadas, the Edwin Sibley Webster Professor in MIT’s Department of Electrical Engineering and Computer Science, and by Hongzhe Liu of Algonquin Regional High School and Ethan Zou of Lexington High School, who joined the project through MIT’s Program for Research in Mathematics, Engineering and Science (PRIMES) program.

Planned disorder

But with respect to reading and writing data — the only type of operations that a memory-management scheme like Tardis is concerned with — most modern chips don’t enforce even this relatively modest constraint. A standard chip from Intel might, for instance, assign the sequence of read/write instructions ABC to a core but let it execute in the order ACB.

Relaxing standards of consistency allows chips to run faster. “Let’s say that a core performs a write operation, and the next instruction is a read,” Yu says. “Under sequential consistency, I have to wait for the write to finish. If I don’t find the data in my cache [the small local memory bank in which a core stores frequently used data], I have to go to the central place that manages the ownership of data.”

“This may take a lot of messages on the network,” he continues. “And depending on whether another core is holding the data, you might need to contact that core. But what about the following read? That instruction is sitting there, and it cannot be processed. If you allow this reordering, then while this write is outstanding, I can read the next instruction. And you may have a lot of such instructions, and all of them can be executed.”

Tardis uses chip space more efficiently than existing memory management schemes because it coordinates cores’ memory operations according to “logical time” rather than chronological time. With Tardis, every data item in a shared memory bank has its own time stamp. Each core also has a counter that effectively time stamps the operations it performs. No two cores’ counters need agree, and any given core can keep churning away on data that has since been updated in main memory, provided that the other cores treat its computations as having happened earlier in time.

Venture capitalists gather to discuss

Surviving breast cancer changed the course of Regina Barzilay’s research. The experience showed her, in stark relief, that oncologists and their patients lack tools for data-driven decision making. That includes what treatments to recommend, but also whether a patient’s sample even warrants a cancer diagnosis, she explained at the Nov. 10 Machine Intelligence Summit, organized by MIT and venture capital firm Pillar.

“We do more machine learning when we decide on Amazon which lipstick you would buy,” said Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science at MIT. “But not if you were deciding whether you should get treated for cancer.”

Barzilay now studies how smarter computing can help patients. She wields the powerful predictive approach called machine learning, a technique that allows computers, given enough data and training, to pick out patterns on their own — sometimes even beyond what humans are capable of pinpointing.

Machine learning has long been vaunted in consumer contexts — Apple’s Siri can talk with us because machine learning enables her to understand natural human speech — yet the summit gave a glimpse of the approach’s much broader potential. Its reach could offer not only better Siris (e.g., Amazon’s “Alexa”), but improved health care and government policies.

Machine intelligence is “absolutely going to revolutionize our lives,” said Pillar co-founder Jamie Goldstein ’89. Goldstein and Anantha Chandrakasan, head of the MIT Department of Electrical Engineering and Computer Science (EECS) and the Vannevar Bush Professor of Electrical Engineering and Computer Science, organized the conference to bring together industry leaders, venture capitalists, students, and faculty from the Computer Science and Artificial Intelligence (CSAIL), Institute for Data, Systems, and Society (IDSS), and the Laboratory for Information and Decision Systems (LIDS) to discuss real-world problems and machine learning solutions.

Barzilay is already thinking along those lines. Her group’s work aims to help doctors and patients make more informed medical decisions with machine learning. She has a vision for the future patient in the oncologist’s office: “If you’re taking this treatment, [you’ll see] how your chances are going to be changed.”

Machine senses

Machine learning has already proven powerful. But Antonio Torralba, professor of electrical engineering and computer science, believes that machines can learn faster, and thereby do more. His team’s approach mimics the way humans learn in infancy. “We just start playing with things and seeing how they feel,” Torralba said. To illustrate, he showed the room a video of a baby turning over squeaky bubble wrap in her hands. Importantly, we notice the noises things make when we move them around, he said.

To give machines a similar sensory experience of the world, a student of Torralba’s recorded himself tapping more than a thousand objects with a wooden drumstick. Called “Greatest Hits,” the sound collection captured the drumstick clanging ceramic cups, ruffling bushes, and splashing water. After feasting on these videos, a computer could start predicting the sounds of the world — essentially reflecting a grasp of its physics — all without explicit instruction.

Videos of everyday scenes (sans drumstick) also prove deft teachers. Machines are usually guided to pick out objects by training them on annotated images. That means people would meticulously outline a photograph’s individual objects, such as people, lamps, and bar stools, so that computers could learn to identify them. But Torralba and his team have found that by giving computers video complete with objects’ sounds — such as a street’s ambient noise or people talking — a machine’s neural network could begin to pick out objects without any guidance at all.

Are you feel emotions with wireless signals

MIT professor and project lead Dina Katabi envisions the system being used in entertainment, consumer behavior, and health care. Film studios and ad agencies could test viewers’ reactions in real-time, while smart homes could use information about your mood to adjust the heating or suggest that you get some fresh air.

“Our work shows that wireless signals can capture information about human behavior that is not always visible to the naked eye,” says Katabi, who co-wrote a paper on the topic with PhD students Mingmin Zhao and Fadel Adib. “We believe that our results could pave the way for future technologies that could help monitor and diagnose conditions like depression and anxiety.”

EQ-Radio builds on Katabi’s continued efforts to use wireless technology for measuring human behaviors such as breathing and falling. She says that she will incorporate emotion-detection into her spinoff company Emerald, which makes a device that is aimed at detecting and predicting falls among the elderly.

Using wireless signals reflected off people’s bodies, the device measures heartbeats as accurately as an ECG monitor, with a margin of error of approximately 0.3 percent. It then studies the waveforms within each heartbeat to match a person’s behavior to how they previously acted in one of the four emotion-states.

The team will present the work next month at the Association of Computing Machinery’s International Conference on Mobile Computing and Networking (MobiCom).

How it works

Existing emotion-detection methods rely on audiovisual cues or on-body sensors, but there are downsides to both techniques. Facial expressions are famously unreliable, while on-body sensors such as chest bands and ECG monitors are inconvenient to wear and become inaccurate if they change position over time.

EQ-Radio instead sends wireless signals that reflect off of a person’s body and back to the device. Its beat-extraction algorithms break the reflections into individual heartbeats and analyze the small variations in heartbeat intervals to determine their levels of arousal and positive affect.

These measurements are what allow EQ-Radio to detect emotion. For example, a person whose signals correlate to low arousal and negative affect is more likely to tagged as sad, while someone whose signals correlate to high arousal and positive affect would likely be tagged as excited.

The exact correlations vary from person to person, but are consistent enough that EQ-Radio could detect emotions with 70 percent accuracy even when it hadn’t previously measured the target person’s heartbeat.

“Just by knowing how people breathe and how their hearts beat in different emotional states, we can look at a random person’s heartbeat and reliably detect their emotions,” says Zhao.

Learns to recognize sounds

In recent years, computers have gotten remarkably good at recognizing speech and images: Think of the dictation software on most cellphones, or the algorithms that automatically identify people in photos posted to Facebook.

But recognition of natural sounds — such as crowds cheering or waves crashing — has lagged behind. That’s because most automated recognition systems, whether they process audio or visual information, are the result of machine learning, in which computers search for patterns in huge compendia of training data. Usually, the training data has to be first annotated by hand, which is prohibitively expensive for all but the highest-demand applications.

Sound recognition may be catching up, however, thanks to researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). At the Neural Information Processing Systems conference next week, they will present a sound-recognition system that outperforms its predecessors but didn’t require hand-annotated data during training.

Instead, the researchers trained the system on video. First, existing computer vision systems that recognize scenes and objects categorized the images in the video. The new system then found correlations between those visual categories and natural sounds.

“Computer vision has gotten so good that we can transfer it to other domains,” says Carl Vondrick, an MIT graduate student in electrical engineering and computer science and one of the paper’s two first authors. “We’re capitalizing on the natural synchronization between vision and sound. We scale up with tons of unlabeled video to learn to understand sound.”

The researchers tested their system on two standard databases of annotated sound recordings, and it was between 13 and 15 percent more accurate than the best-performing previous system. On a data set with 10 different sound categories, it could categorize sounds with 92 percent accuracy, and on a data set with 50 categories it performed with 74 percent accuracy. On those same data sets, humans are 96 percent and 81 percent accurate, respectively.

“Even humans are ambiguous,” says Yusuf Aytar, the paper’s other first author and a postdoc in the lab of MIT professor of electrical engineering and computer science Antonio Torralba. Torralba is the final co-author on the paper.

“We did an experiment with Carl,” Aytar says. “Carl was looking at the computer monitor, and I couldn’t see it. He would play a recording and I would try to guess what it was. It turns out this is really, really hard. I could tell indoor from outdoor, basic guesses, but when it comes to the details — ‘Is it a restaurant?’ — those details are missing. Even for annotation purposes, the task is really hard.”

Complementary modalities

Because it takes far less power to collect and process audio data than it does to collect and process visual data, the researchers envision that a sound-recognition system could be used to improve the context sensitivity of mobile devices.

When coupled with GPS data, for instance, a sound-recognition system could determine that a cellphone user is in a movie theater and that the movie has started, and the phone could automatically route calls to a prerecorded outgoing message. Similarly, sound recognition could improve the situational awareness of autonomous robots.

“For instance, think of a self-driving car,” Aytar says. “There’s an ambulance coming, and the car doesn’t see it. If it hears it, it can make future predictions for the ambulance — which path it’s going to take — just purely based on sound.”

Childhood communication disorders

For children with speech and language disorders, early-childhood intervention can make a great difference in their later academic and social success. But many such children — one study estimates 60 percent — go undiagnosed until kindergarten or even later.

Researchers at the Computer Science and Artificial Intelligence Laboratory at MIT and Massachusetts General Hospital’s Institute of Health Professions hope to change that, with a computer system that can automatically screen young children for speech and language disorders and, potentially, even provide specific diagnoses.

This week, at the Interspeech conference on speech processing, the researchers reported on an initial set of experiments with their system, which yielded promising results. “We’re nowhere near finished with this work,” says John Guttag, the Dugald C. Jackson Professor in Electrical Engineering and senior author on the new paper. “This is sort of a preliminary study. But I think it’s a pretty convincing feasibility study.”

The system analyzes audio recordings of children’s performances on a standardized storytelling test, in which they are presented with a series of images and an accompanying narrative, and then asked to retell the story in their own words.

“The really exciting idea here is to be able to do screening in a fully automated way using very simplistic tools,” Guttag says. “You could imagine the storytelling task being totally done with a tablet or a phone. I think this opens up the possibility of low-cost screening for large numbers of children, and I think that if we could do that, it would be a great boon to society.”

Subtle signals

The researchers evaluated the system’s performance using a standard measure called area under the curve, which describes the tradeoff between exhaustively identifying members of a population who have a particular disorder, and limiting false positives. (Modifying the system to limit false positives generally results in limiting true positives, too.) In the medical literature, a diagnostic test with an area under the curve of about 0.7 is generally considered accurate enough to be useful; on three distinct clinically useful tasks, the researchers’ system ranged between 0.74 and 0.86.

To build the new system, Guttag and Jen Gong, a graduate student in electrical engineering and computer science and first author on the new paper, used machine learning, in which a computer searches large sets of training data for patterns that correspond to particular classifications — in this case, diagnoses of speech and language disorders.

The training data had been amassed by Jordan Green and Tiffany Hogan, researchers at the MGH Institute of Health Professions, who were interested in developing more objective methods for assessing results of the storytelling test. “Better diagnostic tools are needed to help clinicians with their assessments,” says Green, himself a speech-language pathologist. “Assessing children’s speech is particularly challenging because of high levels of variation even among typically developing children. You get five clinicians in the room and you might get five different answers.”

Unlike speech impediments that result from anatomical characteristics such as cleft palates, speech disorders and language disorders both have neurological bases. But, Green explains, they affect different neural pathways: Speech disorders affect the motor pathways, while language disorders affect the cognitive and linguistic pathways.

A nightmare on Ames Street

“People are afraid of artificial intelligence, from autonomous cars making unethical decisions in accidents, to robots taking our jobs and causing mass unemployment, to runaway superintelligent machines obliterating humanity. Engineering pioneer and inventor Elon Musk famously said that as we develop AI, we are ‘summoning the demon.’

Halloween is a time when people celebrate the things that terrify them. So it seems like a perfect occasion for an MIT project that explores society’s fear of AI. And what better way to do this than have an actual AI literally scare us in an immediate, visceral sense? Postdoc Pinar Yanardhag, visiting scientist Manuel Cebrian, and I used a recently published, open-source deep neural network algorithm to learn features of a haunted house and apply these features to a picture of the Media Lab.

We also launched the Nightmare Machine website, where people can vote on which AI-generated horror images they find scary; these were generated using the same algorithm, combined with another recent algorithm for generating faces. So far, we’ve collected over 300,000 individual votes, and the results are clear: the AI demon is here, and it can terrify us. Happy Halloween!”

Lets nonexperts optimize programs

Dynamic programming is a technique that can yield relatively efficient solutions to computational problems in economics, genomic analysis, and other fields. But adapting it to computer chips with multiple “cores,” or processing units, requires a level of programming expertise that few economists and biologists have.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stony Brook University aim to change that, with a new system that allows users to describe what they want their programs to do in very general terms. It then automatically produces versions of those programs that are optimized to run on multicore chips. It also guarantees that the new versions will yield exactly the same results that the single-core versions would, albeit much faster.

In experiments, the researchers used the system to “parallelize” several algorithms that used dynamic programming, splitting them up so that they would run on multicore chips. The resulting programs were between three and 11 times as fast as those produced by earlier techniques for automatic parallelization, and they were generally as efficient as those that were hand-parallelized by computer scientists.

The researchers presented their new system last week at the Association for Computing Machinery’s conference on Systems, Programming, Languages and Applications: Software for Humanity.

Dynamic programming offers exponential speedups on a certain class of problems because it stores and reuses the results of computations, rather than recomputing them every time they’re required.

“But you need more memory, because you store the results of intermediate computations,” says Shachar Itzhaky, first author on the new paper and a postdoc in the group of Armando Solar-Lezama, an associate professor of electrical engineering and computer science at MIT. “When you come to implement it, you realize that you don’t get as much speedup as you thought you would, because the memory is slow. When you store and fetch, of course, it’s still faster than redoing the computation, but it’s not as fast as it could have been.”

Outsourcing complexity

Computer scientists avoid this problem by reordering computations so that those requiring a particular stored value are executed in sequence, minimizing the number of times that the value has to be recalled from memory. That’s relatively easy to do with a single-core computer, but with multicore computers, when multiple cores are sharing data stored at multiple locations, memory management become much more complex. A hand-optimized, parallel version of a dynamic-programming algorithm is typically 10 times as long as the single-core version, and the individual lines of code are more complex, to boot.

The CSAIL researchers’ new system — dubbed Bellmania, after Richard Bellman, the applied mathematician who pioneered dynamic programming — adopts a parallelization strategy called recursive divide-and-conquer. Suppose that the task of a parallel algorithm is to perform a sequence of computations on a grid of numbers, known as a matrix. Its first task might be to divide the grid into four parts, each to be processed separately.

But then it might divide each of those four parts into four parts, and each of those into another four parts, and so on. Because this approach — recursion — involves breaking a problem into smaller subproblems, it naturally lends itself to parallelization.

Joining Itzhaky on the new paper are Solar-Lezama; Charles Leiserson, the Edwin Sibley Webster Professor of Electrical Engineering and Computer Science; Rohit Singh and Kuat Yessenov, who were MIT both graduate students in electrical engineering and computer science when the work was done; Yongquan Lu, an MIT undergraduate who participated in the project through MIT’s Undergraduate Research Opportunities Program; and Rezaul Chowdhury, an assistant professor of computer science at Stony Brook, who was formerly a research affiliate in Leiserson’s group.

System helps turn plain text into data for statistical analysis

Of the vast wealth of information unlocked by the Internet, most is plain text. The data necessary to answer myriad questions — about, say, the correlations between the industrial use of certain chemicals and incidents of disease, or between patterns of news coverage and voter-poll results — may all be online. But extracting it from plain text and organizing it for quantitative analysis may be prohibitively time consuming.

Information extraction — or automatically classifying data items stored as plain text — is thus a major topic of artificial-intelligence research. Last week, at the Association for Computational Linguistics’ Conference on Empirical Methods on Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory won a best-paper award for a new approach to information extraction that turns conventional machine learning on its head.

Most machine-learning systems work by combing through training examples and looking for patterns that correspond to classifications provided by human annotators. For instance, humans might label parts of speech in a set of texts, and the machine-learning system will try to identify patterns that resolve ambiguities — for instance, when “her” is a direct object and when it’s an adjective.

Typically, computer scientists will try to feed their machine-learning systems as much training data as possible. That generally increases the chances that a system will be able to handle difficult problems.

In their new paper, by contrast, the MIT researchers train their system on scanty data — because in the scenario they’re investigating, that’s usually all that’s available. But then they find the limited information an easy problem to solve.

“In information extraction, traditionally, in natural-language processing, you are given an article and you need to do whatever it takes to extract correctly from this article,” says Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and senior author on the new paper. “That’s very different from what you or I would do. When you’re reading an article that you can’t understand, you’re going to go on the web and find one that you can understand.”

Confidence boost

Essentially, the researchers’ new system does the same thing. A machine-learning system will generally assign each of its classifications a confidence score, which is a measure of the statistical likelihood that the classification is correct, given the patterns discerned in the training data. With the researchers’ new system, if the confidence score is too low, the system automatically generates a web search query designed to pull up texts likely to contain the data it’s trying to extract.

It then attempts to extract the relevant data from one of the new texts and reconciles the results with those of its initial extraction. If the confidence score remains too low, it moves on to the next text pulled up by the search string, and so on.

Charlie Andrews Jubelt encourages students

Charlie Andrews-Jubelt loves to climb. A rock climber since childhood, he finds that the sport can profoundly connect people, even those who may not seem to have much in common.

“On a fundamental level, we are trying for something very basic and human, which is to ascend a rock,” the MIT senior says.

At its heart, climbing is also about looking out for our fellow humans.

“You save each other’s lives every time you catch your partner on the other end of a rope, and you go through this highly personal experience with them. When you step up to a climb that you are not sure that you can do, you may fail in front of them or succeed with their encouragement,” he says.

For Andrews-Jubelt, this “we’re in this together” mindset extends well beyond the climbing wall. During his time at MIT, the mathematics with computer science major has taken on multiple leadership roles to help empower his peers and foster a supportive community on campus.

Motivated by empathy

When Andrews-Jubelt first came to MIT, he had an injury that made it impossible for him to climb. He remembers feeling frustrated and confined, like someone who used to walk and was being asked to crawl again.

In retrospect, he says, this experience pushed him to become involved in activities he never would have had time for had he been training and competing as a climber. He started volunteering with Violence Prevention and Response (VPR) in MIT’s Division of Student Life, and the group Students Advocating for Education and Respectful Relationships (SAFER). He also became the CEO of Lean on Me, a text-based, anonymous, suicide-prevention peer-support network.

SAFER was an entirely student-run group that ran workshops on preventing sexual assault, and it has now been incorporated into broader effort known as Pleasure (for Peers Leading Education About Sexuality and Speaking Up for Relationship Empowerment).

“I grew up in a household with just my mom and my sister, and I saw that they faced a great deal more sexual harassment and discrimination just as a matter of course, in their everyday lives, just by virtue of being female-bodied,” Andrews-Jubelt says. When he found that sexual assault is common on college campuses, he knew he wanted to do something about it: “I felt that I had the responsibility to, as someone who has a lot of gender privilege.”

“It meant a lot to me to be able to make a difference, even at a grassroots level,” Andrews-Jubelt says of SAFER, whose objectives were to “share ideas that help people feel empowered, and help people prevent gender-based violence from happening. Or react when they see it happening.”

Andrews-Jubelt is also part of the Pleasure student advisory board assembled by Vienna Rothberg, a peer education and prevention specialist at VPR, which helped develop new student programming. Pleasure focuses on issues “upstream” of SAFER, “bringing cultural change to promote an environment of respect in which violence is rare,” says Andrews-Jubelt.

Pleasure facilitates a clinic for sexually transmitted infections so that students can get tested in the same way they might get a flu shot. Every dorm at MIT has a student who has been trained on topics from sexual health to identity politics, and who provides fun, related educational materials and answers questions from other students.