Tweets Sentiment Analysis using Stanford CoreNLP

We're living in an era where data become the most valuable resource! Nearly every app in the market now, tries to understand its users, their behaviours, preferences, reactions and words! How many times, just after mentioning a watch ⌚ in a private conversation with your friend on messenger, your Facebook feed starts popping up ads about watches from different vendors?! It Happens EVERY single time!

Understanding this kind data, classifying and representing it is the challenge that Natural Language Processing (NLP) tries to solve.
In this article, I describe how I built a small application to perform sentiment analysis on tweets, using Stanford CoreNLP library, Twitter4J, Spring Boot and ReactJs! The code is available on GitHub.

Application

For everything related to Machine learning, java is generally not a popular choice. However, given the language popularity, there are some libraries and frameworks for pretty much everything!
The application uses Stanford CoreNLP library java api to analyse tweets extracted by Twitter4J library. The backend server is developed using spring (boot), and the frontend built using ReactJS.
As main functionalities, the application enable based on a keyword to either analyse live twitter stream data and classify it, or perform a search and post-analyse the tweets. The default behaviour is the streaming mode, but we can easily switch to the search mode simply by a click of button!

Stanford CoreNLP

The Stanford CoreNLP is a Java natural language analysis library that provides statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, that can be incorporated into applications with human language technology needs.

Stanford CoreNLP integrates many NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, the sentiment analysis tools, and provides model files for analysis for multiples languages.

The snippet below shows analyse(String tweet) method from SentimentAnalyzerService class which runs sentiment analysis on a single tweet, scores it from 0 to 4 based on whether the analysis comes back with Very Negative, Negative, Neutral, Positive or Very Positive respectively.

public int analyse(String tweet) {

        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, parse, sentiment");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        Annotation annotation = pipeline.process(tweet);
        for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
            Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
            return RNNCoreAnnotations.getPredictedClass(tree);
        }
        return 0;
    }

Fetching Tweets

I made use of the popular open source java library Twitter4J to fetch tweets. It provides a convenient API for accessing the Twitter API.
The TwitterService class contains the main methods interacting with Twitter API to search for tweets based on keywords:

  • fetchTweets builds a Query to search tweets containing a specific keyword. it has a second parameter count, which specifies the number of tweets to return per page, up to a max of 100. I also filter the twitter search results to make sure no retweet or tweet replies are returned.
public Flux<TwitterStatus> fetchTweets(String keyword, int count) throws TwitterException {
        Twitter twitter = this.config.twitter(this.config.twitterFactory());
        Query query = new Query(keyword.concat(" -filter:retweets -filter:replies"));
        query.setCount(count);
        query.setLocale("en");
        query.setLang("en");
        return Flux.fromStream( twitter.search(query).getTweets().stream()).map(status -> this.cleanTweets(status));

    }
  • streamTweets collects live tweets matching a specific keyword
    public Flux<TwitterStatus> streamTweets(String keyword){
        TwitterStream stream = config.twitterStream();
        FilterQuery tweetFilterQuery = new FilterQuery();
        tweetFilterQuery.track(new String[]{keyword});
        tweetFilterQuery.language(new String[]{"en"});
        return Flux.create(sink -> {
            stream.onStatus(status -> sink.next(this.cleanTweets(status)));
            stream.onException(sink::error);
            stream.filter(tweetFilterQuery);
            sink.onCancel(stream::shutdown);
        });
    }

Both methods fetch only tweets in english and returns a reactor Flux, capable of emitting a stream of 0 or more items, and then optionally either completing or erroring.

You should have noticed the call to cleanTweets before passing the tweets to the analyzer service. This method perform some cleanup on tweet text, removing the unneeded elements like links, hashtags, usernames ...

    private TwitterStatus cleanTweets(Status status){
        TwitterStatus twitterStatus = new TwitterStatus(status.getCreatedAt(), status.getId(), status.getText(), null, status.getUser().getName(), status.getUser().getScreenName(), status.getUser().getProfileImageURL());
        // Clean up tweets
        String text = status.getText().trim()
                // remove links
                .replaceAll("http.*?[\\S]+", "")
                // remove usernames
                .replaceAll("@[\\S]+", "")
                // replace hashtags by just words
                .replaceAll("#", "")
                // correct all multiple white spaces to a single white space
                .replaceAll("[\\s]+", " ");
        twitterStatus.setText(text);
        twitterStatus.setSentimentType(analyzerService.analyse(text));
        return twitterStatus;
    }

Showing the analyzed data

Now that we've our backend service ready, the final step is to consume our resources. Both endpoints implement SSE (Server Sent Events), which is a HTTP standard that allows a web application to handle an unidirectional event stream and receive updates whenever server emits data.

I used ReactJs with Typescript to build the Web UI components and consume the exposed REST endpoints. The main component is TweetList that handles the calls and share data with other components.

Once loaded, the component open an event stream with the server, calling the /stream endpoint, looking for all tweets containing Java keyword and saving them into array. It runs the effect and clean it up only once.

React.useEffect(() => {
    const eventSource = new EventSource(
      state.API_URL + "stream/" + state.hashtag
    );
    eventSource.onmessage = (event: any) => {
      const tweet = JSON.parse(event.data);
      let tweets = [...state.tweets, tweet];
      setState({ ...state, tweets: tweets });
    };
    eventSource.onerror = (event: any) => eventSource.close();
    setState({ ...state, eventSource: eventSource });
    return eventSource.close;
  }, []);

It keeps adding tweets to array whenever a message is received from the server. This effect runs whenever the tweets, eventSource or the hashtag change.

  React.useEffect(() => {
    if (state.eventSource) {
      state.eventSource.onmessage = (event: any) => {
        const tweet = JSON.parse(event.data);
        let tweets = [...state.tweets, tweet];
        setState({ ...state, tweets: tweets });
      };
    }
  }, [state.tweets, state.eventSource, state.hashtag]);

Finally, the render function looks like below:

return (
    <Row>
      <Col xs={12} md={8}>
        <Col md={10}>
          <h2>
            Tracked Keyword:
            <Badge variant="secondary">{state.hashtag}</Badge>
          </h2>
        </Col>
        <Col md={2}>
          <Spinner animation="grow" variant="primary" />
        </Col>
        <form
          onSubmit={e => {
            e.preventDefault();
          }}
        >
          <div className="input-group mb-3">
            <input
              type="text"
              name="hashtag"
              value={state.hashtag}
              onChange={e => setState({ ...state, hashtag: e.target.value })}
              className="form-control"
              placeholder={state.hashtag}
              aria-label={state.hashtag}
              aria-describedby="basic-addon2"
            />
            <div className="input-group-append">
              <Button
                variant="outline-primary"
                type="submit"
                onClick={() => {
                  setState({
                    ...state,
                    eventSource: newSearch(true, state, setState),
                    tweets: []
                  });
                }}
              >
                Stream
              </Button>
              <Button
                variant="primary"
                type="submit"
                onClick={() => {
                  setState({
                    ...state,
                    eventSource: newSearch(false, state, setState),
                    tweets: []
                  });
                }}
              >
                Search
              </Button>
            </div>
          </div>
        </form>
        <div id="tweets">
          {tweets
            .filter(tweet => tweet !== undefined)
            .reverse()
            .slice(0, 49)
            .map((tweet: Tweet) => (
              <Alert
                key={tweet.id}
                variant={sentiment[tweet.sentimentType] as "success"}
              >
                <Alert.Heading>
                  <img src={tweet.profileImageUrl} />
                  <a
                    href={"https://twitter.com/" + tweet.screenName}
                    className="text-muted"
                  >
                    {tweet.userName}
                  </a>
                </Alert.Heading>
                {tweet.originalText}
                <hr />
                <p className="mb-0">
                  <Moment fromNow>{tweet.createdAt}</Moment>
                </p>
              </Alert>
            ))}
        </div>
      </Col>
      <Col xs={4} md={4}>
        <Desc tweets={tweets.length} />
        <Doughnut tweets={tweets} />
        <Color />
      </Col>
    </Row>
  );

Running the app

Now before running the app make sure to update application.yaml file with the required authentication keys that will allow you to authenticate correctly when calling the Twitter API to retrieve tweets. You probably need to create a Twitter developer account and create an application.
Start afterward the backend server using mvn spring-boot:run and the frontend npm start.

That's it folks! If you've any remark or suggestion, leave it in the comment below or fill a Github issue.


Ressource: