## Spoken Language Understanding / Argument Tagging

**There are 10 points in total for this homework. Send the completed notebook to [profilmodul1920@cis.lmu.de](mailto:profilmodul1920@cis.lmu.de). The deadline is Tuesday, December 10, 23:59. Please submit a completed version of this file in Python 3 in teams of 2 or 3 students.**

**Please rename the file to argument_tagging_last_names.ipynb**

This homework is an adaptation of this Theano tutorial: http://deeplearning.net/tutorial/rnnslu.html

In this homework, you will train a Keras model for the Spoken Language Understanding task, which consists in assigning a label to each word given a sentence. Itâ€™s a sequence labelling task.

An old and small benchmark for this task is the ATIS (Airline Travel Information System) dataset collected by DARPA. Here is a sentence (or utterance) example using the Inside Outside Beginning (IOB) representation.


|Input (words)|show|flights|from|Boston|to|New|York|today|

|Output (labels)|O|O|O|B-dept|O|B-arr|I-arr|B-date|

The ATIS offical split contains 4,978/893 sentences for a total of 56,590/9,198 words (average sentence length is 15) in the train/test set. The number of classes (different slots) is 128 including the O label (Outside).

Unseen words in the test are dealt with set by marking any words with only one single occurrence in the training set as "&lt;UNK>" and to use this token to represent those unseen words in the test set. Sequences of numbers are converted to repetitions of the string DIGIT i.e. 1984 is converted to DIGITDIGITDIGITDIGIT.

These are the classes and functions we will be needing:

In [26]:
import numpy as np
import math
import json
from keras import Sequential
from keras.layers import Embedding, GRU, TimeDistributed, Dense, Bidirectional, Dropout
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.regularizers import l2
from keras.callbacks import EarlyStopping
np.random.seed(1)

First, you need to download the data from the course homepage (atis.json) into your current directory. Then you can load it into the notebook:

In [6]:
with open("atis.json", "r") as f:
    data = json.load(f)

# Extract data from file
word_to_id = data["vocab"]
label_to_id = data["label_dict"]
test_sents = data["test_sents"]
test_labels = data["test_labels"]

# we also want to create a dev set by splitting the training data
train_dev_sents = data["train_sents"] # list of lists
train_dev_labels = data["train_labels"] # list of lists
num_train = math.floor(0.8 * len(train_dev_sents))

train_sents = train_dev_sents[:num_train]
train_labels = train_dev_labels[:num_train]

dev_sents = train_dev_sents[num_train:]
dev_labels = train_dev_labels[num_train:]

**TODO: How many sentences are in the train, dev and test set? (0.5 p.)**

In [3]:
# TODO: Your answer here

Next, let's define some constants that we'll use later:

In [3]:
UNK_TOKEN = "<UNK>"
PAD_TOKEN = "<PAD>"
VOCAB_SIZE = len(word_to_id)
NUM_LABELS = len(label_to_id)
EMBEDDING_SIZE = 50
HIDDEN_SIZE=50
MAX_LENGTH=20

**TODO: What is the token id of the "&lt;PAD>" token, and what is the label id of "O"? (0.5 p.)**


In [1]:
# TODO: Your answer here

**TODO: Reorganize the data (train, dev, test) so that the token "&lt;PAD>" has id 0 and the label 'O' has id 0 (and all words are still represented). Make sure to also change word_to_id and label_to_id (1 p.)**

In [5]:
# TODO: Your code here

**TODO: print the string representation of the words (not ids) in the first test sentence. (0.5 p.)**

In [6]:
# You may find it helpful to create dictionaries from ids to strings:
id_to_word = {} # TODO
id_to_label = {} # TODO

# TODO: Your code here

Now, let's bring the data into the format needed by Keras

**TODO: create input matrix of size: num_training_sentences x MAX_LENGTH. Do the same for dev and test set. (1 p.)**

**Hint:** Use the Keras methods pad_sequences, to trim/expand exactly to the desired length: https://keras.io/preprocessing/sequence/

In [25]:
def do_padding(sequences, length = MAX_LENGTH):
    return None # TODO

train_sents_padded = do_padding(train_sents)
dev_sents_padded = do_padding(dev_sents) 
test_sents_padded = do_padding(test_sents)

Let's do the same for the labels.

**TODO: Create numpy matrices to encode the labels. In addition to your do_padding function, you need to transform the label ids into "one-hot encodings" (vectors that are 1 for the specified label, and 0 otherwise). The resulting matrices should have shape num_sentences x MAX_LENGTH x NUM_LABELS. (1 p.)**

**Hint:**  Use the keras function to_categorical. https://keras.io/utils/#to_categorical

In [8]:
train_labels_padded = None # TODO
dev_labels_padded = None # TODO
test_labels_padded = None # TODO

We are ready to define the GRU model. It consists of the following components:
* An embedding layer that learns word vectors that are passed on to the GRU as an input.
* The GRU layer. We use a bidirectional GRU to consider information from left and right.
* A final layer that predicts a label for each position in the sentence from the GRU hidden states.

**TODO: Create the embedding layer for the vocabulary. It learns lookup vectors (size: EMBEDDING_SIZE) for all words in the vocabulary. Make sure to enable masking of the &lt;PAD> words! (1 p.)**

In [9]:
model = Sequential()
model.add(None) # TODO

**TODO: Apply 50% dropout to the embeddings by adding a Dropout layer (0.5 p.)**

In [11]:
model.add(None) # TODO

**TODO: Add a bidirectional GRU layer with hidden units of size HIDDEN_SIZE. The GRU should return the sequence of hidden states. Apply L2 regularization with a weight of 0.001 to the GRU kernel and bias. (1 p.)**

In [10]:
model.add(None) # TODO

**TODO: Output a prediction over the possible labels at each time step, i.e. apply a Dense layer with softmax activation at each time step. (1 p.)**

**Hint:** Use the TimeDistributed wrapper: https://keras.io/layers/wrappers/

In [11]:
model.add(None) # TODO

We compile the model with the 'adam' optimizer and 'categorical_crossentropy' as the loss (this corresponds to negative log-likelihood). We also monitor the 'acurracy' as the metric of interest.


In [12]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

**TODO: implement a callback that stops training when the development set loss ("val_loss") has not decreased for 2 epochs (0.5 p.)**

**Hint:** https://keras.io/callbacks/#earlystopping

In [13]:
earlystop = None # TODO

In [2]:
model.fit(train_sents_padded, train_labels_padded, batch_size=8, \
          callbacks = [earlystop], epochs=100, \
          validation_data=(dev_sents_padded, dev_labels_padded))

**TODO: Now, evaluate the model on the test data. (0.5 p.)**

In [None]:
# TODO: Your code here

**TODO: Predict the label sequence for the first sentence in the test data. Print both the sentence and the predicted labels (words/label strings, not ids.) (1 p.)**

**Hint:** You can use model.predict_classes(...) to obtain the predicted label ids.

In [15]:
# TODO: Your code here

**Optional:** 

Feel free to experiment with different settings (without zero-masking/Dropout/Regularization, change Dropout rate/l2 weights/optimizer/metrics ...)