Tweets Sentiment Analysis using Stanford CoreNLP
We're living in an era where data become the most valuable resource! Nearly every app in the market now, tries to understand its users, their behaviours, preferences, reactions and words! How many times, just after mentioning a watch ⌚ in a private conversation with your friend on messenger, your Facebook feed starts popping up ads about watches from different vendors?! It Happens EVERY single time!
Understanding this kind data, classifying and representing it is the challenge that Natural Language Processing (NLP) tries to solve.
In this article, I describe how I built a small application to perform sentiment analysis on tweets, using Stanford CoreNLP library, Twitter4J, Spring Boot and ReactJs! The code is available on GitHub.
Application
For everything related to Machine learning, java is generally not a popular choice. However, given the language popularity, there are some libraries and frameworks for pretty much everything!
The application uses Stanford CoreNLP library java api to analyse tweets extracted by Twitter4J library. The backend server is developed using spring (boot), and the frontend built using ReactJS.
As main functionalities, the application enable based on a keyword to either analyse live twitter stream data and classify it, or perform a search and post-analyse the tweets. The default behaviour is the streaming mode, but we can easily switch to the search mode simply by a click of button!
Stanford CoreNLP
The Stanford CoreNLP is a Java natural language analysis library that provides statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, that can be incorporated into applications with human language technology needs.
Stanford CoreNLP integrates many NLP tools, including the part-of-speech
(POS) tagger, the named entity recognizer
(NER), the parser, the coreference resolution
system, the sentiment analysis
tools, and provides model files for analysis for multiples languages.
The snippet below shows analyse(String tweet)
method from SentimentAnalyzerService
class which runs sentiment analysis on a single tweet, scores it from 0 to 4 based on whether the analysis comes back with Very Negative
, Negative
, Neutral
, Positive
or Very Positive
respectively.
public int analyse(String tweet) {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, parse, sentiment");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = pipeline.process(tweet);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
return RNNCoreAnnotations.getPredictedClass(tree);
}
return 0;
}
Fetching Tweets
I made use of the popular open source java library Twitter4J to fetch tweets. It provides a convenient API for accessing the Twitter API.
The TwitterService
class contains the main methods interacting with Twitter API to search for tweets based on keywords:
fetchTweets
builds aQuery
to search tweets containing a specific keyword. it has a second parametercount
, which specifies the number of tweets to return per page, up to a max of 100. I also filter the twitter search results to make sure no retweet or tweet replies are returned.
public Flux<TwitterStatus> fetchTweets(String keyword, int count) throws TwitterException {
Twitter twitter = this.config.twitter(this.config.twitterFactory());
Query query = new Query(keyword.concat(" -filter:retweets -filter:replies"));
query.setCount(count);
query.setLocale("en");
query.setLang("en");
return Flux.fromStream( twitter.search(query).getTweets().stream()).map(status -> this.cleanTweets(status));
}
streamTweets
collects live tweets matching a specific keyword
public Flux<TwitterStatus> streamTweets(String keyword){
TwitterStream stream = config.twitterStream();
FilterQuery tweetFilterQuery = new FilterQuery();
tweetFilterQuery.track(new String[]{keyword});
tweetFilterQuery.language(new String[]{"en"});
return Flux.create(sink -> {
stream.onStatus(status -> sink.next(this.cleanTweets(status)));
stream.onException(sink::error);
stream.filter(tweetFilterQuery);
sink.onCancel(stream::shutdown);
});
}
Both methods fetch only tweets in english and returns a reactor Flux
, capable of emitting a stream of 0 or more items, and then optionally either completing or erroring.
You should have noticed the call to cleanTweets
before passing the tweets to the analyzer service. This method perform some cleanup on tweet text, removing the unneeded elements like links, hashtags, usernames ...
private TwitterStatus cleanTweets(Status status){
TwitterStatus twitterStatus = new TwitterStatus(status.getCreatedAt(), status.getId(), status.getText(), null, status.getUser().getName(), status.getUser().getScreenName(), status.getUser().getProfileImageURL());
// Clean up tweets
String text = status.getText().trim()
// remove links
.replaceAll("http.*?[\\S]+", "")
// remove usernames
.replaceAll("@[\\S]+", "")
// replace hashtags by just words
.replaceAll("#", "")
// correct all multiple white spaces to a single white space
.replaceAll("[\\s]+", " ");
twitterStatus.setText(text);
twitterStatus.setSentimentType(analyzerService.analyse(text));
return twitterStatus;
}
Showing the analyzed data
Now that we've our backend service ready, the final step is to consume our resources. Both endpoints implement SSE (Server Sent Events), which is a HTTP standard that allows a web application to handle an unidirectional event stream and receive updates whenever server emits data.
I used ReactJs with Typescript to build the Web UI components and consume the exposed REST endpoints. The main component is TweetList
that handles the calls and share data with other components.
Once loaded, the component open an event stream with the server, calling the /stream
endpoint, looking for all tweets containing Java
keyword and saving them into array. It runs the effect and clean it up only once.
React.useEffect(() => {
const eventSource = new EventSource(
state.API_URL + "stream/" + state.hashtag
);
eventSource.onmessage = (event: any) => {
const tweet = JSON.parse(event.data);
let tweets = [...state.tweets, tweet];
setState({ ...state, tweets: tweets });
};
eventSource.onerror = (event: any) => eventSource.close();
setState({ ...state, eventSource: eventSource });
return eventSource.close;
}, []);
It keeps adding tweets to array whenever a message is received from the server. This effect runs whenever the tweets
, eventSource
or the hashtag
change.
React.useEffect(() => {
if (state.eventSource) {
state.eventSource.onmessage = (event: any) => {
const tweet = JSON.parse(event.data);
let tweets = [...state.tweets, tweet];
setState({ ...state, tweets: tweets });
};
}
}, [state.tweets, state.eventSource, state.hashtag]);
Finally, the render function looks like below:
return (
<Row>
<Col xs={12} md={8}>
<Col md={10}>
<h2>
Tracked Keyword:
<Badge variant="secondary">{state.hashtag}</Badge>
</h2>
</Col>
<Col md={2}>
<Spinner animation="grow" variant="primary" />
</Col>
<form
onSubmit={e => {
e.preventDefault();
}}
>
<div className="input-group mb-3">
<input
type="text"
name="hashtag"
value={state.hashtag}
onChange={e => setState({ ...state, hashtag: e.target.value })}
className="form-control"
placeholder={state.hashtag}
aria-label={state.hashtag}
aria-describedby="basic-addon2"
/>
<div className="input-group-append">
<Button
variant="outline-primary"
type="submit"
onClick={() => {
setState({
...state,
eventSource: newSearch(true, state, setState),
tweets: []
});
}}
>
Stream
</Button>
<Button
variant="primary"
type="submit"
onClick={() => {
setState({
...state,
eventSource: newSearch(false, state, setState),
tweets: []
});
}}
>
Search
</Button>
</div>
</div>
</form>
<div id="tweets">
{tweets
.filter(tweet => tweet !== undefined)
.reverse()
.slice(0, 49)
.map((tweet: Tweet) => (
<Alert
key={tweet.id}
variant={sentiment[tweet.sentimentType] as "success"}
>
<Alert.Heading>
<img src={tweet.profileImageUrl} />
<a
href={"https://twitter.com/" + tweet.screenName}
className="text-muted"
>
{tweet.userName}
</a>
</Alert.Heading>
{tweet.originalText}
<hr />
<p className="mb-0">
<Moment fromNow>{tweet.createdAt}</Moment>
</p>
</Alert>
))}
</div>
</Col>
<Col xs={4} md={4}>
<Desc tweets={tweets.length} />
<Doughnut tweets={tweets} />
<Color />
</Col>
</Row>
);
Running the app
Now before running the app make sure to update application.yaml
file with the required authentication keys that will allow you to authenticate correctly when calling the Twitter API to retrieve tweets. You probably need to create a Twitter developer account and create an application.
Start afterward the backend server using mvn spring-boot:run
and the frontend npm start
.
That's it folks! If you've any remark or suggestion, leave it in the comment below or fill a Github issue.