Invited Speakers
Important Dates
Supported by
TTIJ.jpg   TTIC.jpg
   AIP.jpg     AIRC.png
Additional cooperation from
tokyotech2.png logo_A.jpg

Third International Workshop on Symbolic-Neural Learning (SNL-2019)

July 11-12, 2019
Miraikan hall, Odaiba Miraikan 7F (Tokyo, Japan)

Poster Session:

July 12 (Friday), 13:20-14:20
  • P-1: Domain Discrepancy Measure for Complex Models in Unsupervised Domain Adaptation

    Jongyeong Lee, Nontawat Charoenphakdee, Seiichi Kuroki, Masashi Sugiyama
    (The University of Tokyo/RIKEN AIP)

    Appropriately evaluating the discrepancy between domains is essential for the success of unsupervised domain adaptation. In this paper, we first point out that existing discrepancy measures are less informative when complex models such as deep neural networks are used, in addition to the facts that they can be computationally highly demanding and their range of applications is limited only to binary classification. We then propose a novel domain discrepancy measure, called the paired hypotheses discrepancy (PHD), to overcome these shortcomings. PHD is computationally efficient and applicable to multi-class classification. Through regret bound analysis, we theoretically show that PHD is effective even for complex models. Finally, we demonstrate the practical usefulness of PHD through experiments.

  • P-2: On Symmetric Losses for Learning from Corrupted Labels

    Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama
    (The University of Tokyo/RIKEN AIP)

    This work aims to provide a better understanding of a symmetric loss. First, we emphasize that using a symmetric loss is advantageous in the balanced error rate (BER) minimization and area under the receiver operating characteristic curve (AUC) maximization from corrupted labels. Second, we prove general theoretical properties of symmetric losses, including a classification-calibration condition, excess risk bound, conditional risk minimizer, and AUC-consistency condition. Third, since all nonnegative symmetric losses are non-convex, we propose a convex barrier hinge loss that benefits significantly from the symmetric condition, although it is not symmetric everywhere. Finally, we conduct experiments to validate the relevance of the symmetric condition.

  • P-3: Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

    Takuya Shimada, Han Bao, Issei Sato, Masashi Sugiyama
    (The University of Tokyo/RIKEN AIP)

    Pairwise similarities and dissimilarities between data points might be easier to obtain than fully labeled data in real-world classification problems, e.g., in privacy-aware situations. To handle such pairwise information, an empirical risk minimization approach has been proposed, giving an unbiased estimator of the classification risk that can be computed only from pairwise similarities and unlabeled data. However, this direction cannot handle pairwise dissimilarities so far. On the other hand, semi-supervised clustering is one of the methods which can use both similarities and dissimilarities. Nevertheless, they typically require strong geometrical assumptions on the data distribution such as the manifold assumption, which may deteriorate the performance. In this paper, we derive an unbiased risk estimator which can handle all of similarities/dissimilarities and unlabeled data. We theoretically establish estimation error bounds and experimentally demonstrate the practical usefulness of our empirical risk minimization method.

  • P-4: Scalable Learning of Logic Programs with Neural Network

    Yin Jun Phua, Katsumi Inoue
    (The Graduate University for Advanced Studies/NII)

    Real world data are often noisy and fuzzy. Most traditional logical machine learning methods require the data to be first discretized or pre-processed before being able to produce useful output. Such short-coming often limits their application to real world data. On the other hand, neural networks are generally known to be robust against noisy data. However, a fully trained neural network does not provide easily understandable rules that can be used to understand the underlying model. In a previous work, we proposed a Differentiable Learning from Interpretation Transition (DLFIT) algorithm, that can simultaneously output logic programs fully explaining the state transitions, and also learn from data containing noise and error. However, it was not scalable and was limited to systems with 5 variables or less. In this work, we extend our method to be able to handle systems with more variables, and we show that our method is still robust to noisy data.

  • P-5: Using External DB Knowledge in Neural DDI Extraction

    Masaki Asada, Makoto Miwa, Yutaka Sasaki
    (Toyota Technological Institute)

    Drug-drug interactions (DDIs) is an unintentional effect of using two or more drugs, such as side effects caused by the combination of the drugs. Studies on automatic DDI extraction from biomedical texts are expected to help medical experts. Deep neural network based methods have recently been employed for DDI extraction; however, these models suffered from the lack of labeled corpus. We improve extracting DDIs by using the model which leverages unsupervised pre-training on a large multi-domain scientific corpus and combining drug information registered with external drug database. We encode textual drug pairs with SciBERT (Beltagy et al., 2019) and their drug database information with Graph Convolutional Networks (GCNs), and then we combine the outputs of these two networks. In the experiments, we evaluated our model on the Task 9.2 of the DDIExtraction-2013 shared task.

  • P-6: Mixed Rules-based and Deep learning models for Imbalanced data chatbot

    Wang Zhongsheng
    (BizReach, Inc.)

    In our past work, we try to design a specific closed-domain self-learn chatbot, which is facing an imbalanced data problem when it starts its job. Because its specific domain, over 80% questions are concerted on a few topics, and those answers of similar topics are in different classes. On the other hand, left 20% questions only have few samples of each class, that make its Neural Network hard to classify those questions. If not do any process, the total classes are over 8 thousands. In this work, we propose a mixed rules-based and deep learning models to save this imbalanced data problem. Firstly, based on a user's question content, the system will choose different model to deal with it. Rules-based model is in charge of hot topic questions, Neural Network will answer left questions. After stripping hot topic questions, only includes over 800 classes left, that make it much easier for Neural Network to classify those questions. Secondly, Rules-based model is consist of a series of ElasticSearch functions. Besides, chatbot engine also will check model results before chatbot give a final answer. If chat bot can't get useful answer, system will continue decompose question, until get a fitful response. Chatbot run the learning function per day, so than it can learn new question by itself. Finally, Current dataset includes total 10thousands Q&A, I believe it will be more intelligent if we can get enough data. This system effectively help us solve imbalanced data problem and improve Q&A accuracy. From this work we can know, Neural Network can't salve some real data problem, especially mini dataset with imbalanced data. In those special condition, combine other parsing and logic methods will help us solve more problem.

  • P-7: Reflection-based Word Attribute Transfer

    Yoichi Ishibashi, Katsuhito Sudoh, Koichiro Yoshino, Satoshi Nakamura

    We propose a word attribute transfer method based on a concept of reflection in a word embedding space. PMI-baed word embeddings (e.g., word2vec) represents some analogic relations. This relation can be used for word attribute transfer. This attribute transfer requires the explicit knowledge whether the input word is for male or female. This kind of knowledge cannot be developed for all possible attributes. In this work, we propose a method for word attribute transfer without such explicit knowledge. We incorporate reflection-based mapping into word attribute transfer based on an assumption that a word with an inverted attribute can be obtained by negation of the attribute. We define the attribute negation as reflection in a word embedding space. The reflection operation has a similar property to negation because an identity mapping is obtained when it is applied twice. Experimental results show that the proposed method enables to transform word gender without explicit gender information of the input words.

  • P-8: Entity Tracking for Data-to-Text Generation

    Hayate Iso, Yui Uehara, Tatsuya Ishigaki, Hiroshi Noji, Eiji Aramaki, Ichiro Kobayashi, Yusuke Miyao, Naoaki Okazaki, Hiroya Takamura
    (NAIST, AIST AIRC, Tokyo Institute of Technology, Ochanomizu University, The University of Tokyo)

    We propose a data-to-text generation model with two modules, one for entity tracking and the other for text generation. Our entity tracking module selects and keeps track of salient entity and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of entity tracking module. This process is considered to simulate the human-like writing that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generation. Experimental results show that our model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.

  • P-9: Neural Combinatory Constituency Parsing

    Zhousi Chen, Mamoru Komachi
    (Tokyo Metropolitan University)

    Transition-based and chart-based parsers are two main methods for constituency parsing. Their computation unit is not a word but either an action or a span ranked by scores. However, one natural way for a human to parse a sentence is to read the words and make grammatical groups where the words still play a central role in the process. In this research, we propose a novel triangular neural network to encapsulate the process. The triangle consists of n-1 layers, where the bottom layer of size n takes the input word embeddings and forwarding them to the upper layer of size n-1. The orientation function is local and implemented as a binary classification, namely left or right. When two orientations agree, their embeddings becomes a joint embedding for a phrase in the upper layer. Recursively at the (n-1)-th top layer, stands the sole embedding for the sentence. Through labeling such embeddings and taking the orientations into consideration, a tree can be built at the complexity of O(n^2), well between O(n) of a greedy transition-based parser with a lower f1 score and O(n^3) of a chart parser yielding a higher f1 score. Our model achieves an f1 score of 89.99 with greedy decoding on PTB dataset.

  • P-10: Decomposing Sentence Vectors Twice Gives Better and More Interpretable Representations

    Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui
    (Tohoku University/RIKEN AIP)

    A drawback of the approach of utilizing sentence encoders to model textual meaning similarity is low interpretability. It is difficult to obtain a detailed explanation of how and what words in the two sentences are associated. In this paper, we develop a method providing both high performance and interpretability. We first decompose high-quality sentence vectors into word vectors. We then decompose word vectors into their norm and direction to obtain better discrete distributions needed for the word mover's distance (WMD), which models optimal transport of semantic features between two sets of words. We conducted experiments on several semantic textual similarity (STS) benchmark datasets and demonstrate that the proposed method outperforms the current state-of-the-art methods by a wide margin in the setting of fully unsupervised learning. We also visualize the interpretability of the proposed method by showing the word-level alignment results obtained via the fully unsupervised optimal transport plan. Additionally, we give an intuitive and theoretical link between some kind of unsupervised sentence encoders and our proposed method.

  • P-11: An analysis of an English Vocabulary Knowledge Dataset of Japanese English-as-a-Second-Language Learners Using Crowdsourcing

    Yo Ehara
    (Shizuoka Institute of Science and Technology/AIST AIRC)

    This poster introduces a freely available dataset for analyzing the English vocabulary of English-as-a-second language (ESL) learners and an analysis of it. While ESL vocabulary tests have been extensively studied, few of the results have been made public because 1) most of the tests are used to grade test takers, i.e., placement tests; thus, they are treated as private information that should not be leaked, and 2) the primary focus of most language-educators is how to measure their students' ESL vocabulary, rather than the test results of the other test takers. However, to build and evaluate systems to support language learners, especially from a cognitive perspective, there exists a strong need for a publicly-available dataset that records the learners' vocabulary. Our dataset meets this need: it contains the results of the vocabulary size test, a well-studied English vocabulary test, by one hundred test takers hired via crowdsourcing. Regarding second language vocabulary, several datasets were previously built for complex word identification shared tasks: however, they contain only the number of learners who identified words or phrases in sentences as complex. Since second language learners' abilities vary greatly, for a cognitive analysis, we need precise information on which learner answered correctly/incorrectly to which question, which our dataset provides. Unlike high-stakes testing, the test takers of our dataset were little motivated to cheat on the tests to obtain high scores. Brief test-theory-based analysis on the dataset showed an excellent test reliability of $0.91$ (Cronbach's alpha). Analysis using item response theory also indicates that the test is reliable and successfully measures the vocabulary ability of language learners. We also measured how well the responses from the learners can be predicted with high accuracy using machine-learning methods. (Most of the contents of this presentation was previously presented at LREC 2018.)

  • P-12: Masking Unnecessary Information in Dependency Trees for Neural Relation Classification

    Tomoki Tsujimura, Makoto Miwa, Yutaka Sasaki
    (Toyota Technological Institute)

    In relation classification, the mention about the relation often exists in the shortest dependency path between target entities and omitting tokens outside the shortest path from the input of the relation classification model improves generalization ability. However, it is a heuristic rule and inflexible to unexpected relations such as relations that require information outside of the path and relations not directly mentioned. We propose a novel masking mechanism for neural relation classification that learns to mask unnecessary nodes in dependency trees in an end-to-end manner. Our masking mechanism works as a hidden layer to drop unnecessary hidden vectors at the token level by discrete masks during both training and test time. Following layers process only the remaining unmasked tokens and aggregate them with an attention mechanism to represent relations. We show that the relation classification model with our method performs the results comparable to the one obtained from the model using the shortest path heuristic. We also investigate the differences in the remaining tokens between the shortest path and the learned masks.

  • P-13: Neural Exhaustive Nested Named Entity Recognition with BERT

    Hai-Long Trieu, Mohammad Golam Sohrab, Makoto Miwa

    We present our recently published neural exhaustive nested named entity recognition (NER) model, which detects nested entities in text, and its on-going extensions. The model detects any overlapping regions or word sequences by enumerating all possible regions and classifying them. We built the model on top of two sentence representations LSTMs and the powerful pretrain BERT model. Experimental results on several data sets show that our model can obtain promissing results and significantly outperform the existing flat and nested NER models. In our poster, we will discuss several different settings and possible future directions.

  • P-14: Towards Temporal Knowledge Graph Embeddings with Arbitrary Time Precision

    Julien Leblay, Melisachew Wudage Chekol, Xin Liu
    (AIST AIRC (Japan) & University of Mannheim (Germany))

    Knowledge Graphs (KG) are commonly used to model knowledge in which labeled nodes represent entities or concepts, and directed, labeled edges represent relationships among them. In other words, a single fact is modeled as two nodes connected by an edge. Knowledge Graph Embedding (KGE) consists in learning a vector representation for such graphs, to perform approximate fact inference or machine learning tasks such as node clustering and classification. However, knowledge changes overtime and conventional embedding methods fail to capture fact validity over time. This work belongs to a string of recent research aiming to solve this problem, namely by adding a time dimension to the conventional KGE problem. Unlike most existing work in the area which consider a single time granularity (usually year), we consider multiple time granularities to better reflect the structure of existing temporal knowledge. We also consider facts whose validity spans across multiple, possibly disconnected time periods. We propose a neural network-based architecture and methodology, to tackle this issue in a scalable way. We will present early results collected so far for this research.

  • P-15: T2KG: An End-to-End Knowledge Graph Construction Framework

    Natthawut Kertkeidkachorn, Ryutaro Ichise

    Knowledge Graphs play an important role in many AI applications as prior knowledge. In recent years, there are many existing Knowledge Graphs such as DBpedia, Freebase, YAGO. Nevertheless, massive amounts of knowledge are being produced every day. Consequently, Knowledge Graphs become more obsolete over time. It is therefore necessary to populate new knowledge into Knowledge Graphs in order to keep them useable. In this study, we present our end-to-end system for populating knowledge graph from natural language text, namely T2KG. Also, we demonstrate use-cases, achievements, challenges, and lessons learned of the system in practice.

  • P-16: An Impression Prediction System of Oral Presentation Using LSTM and Attention Mechanism

    Shengzhou Yi, Xueting Wang, Toshihiko Yamasaki
    (The University of Tokyo)

    We propose a presentation support system to provide impression-related feedback for presentation speakers. Our system is a multimodal neural network including two Long Short-Term Memory (LSTM) to learn linguistic features and acoustic features, respectively. An attention neural network is used to combine different feature representations with appropriate weights. We collect more than 2,400 presentation videos with official captions and users ratings from TED Talks. Our system can recognize 14 types of audience impressions with an average accuracy of 85.0%. Our system not only has the advantage of making noticeable improvements to the accuracy of predicting audiences impressions, but also can significantly speed-up the training process as compared to the existing systems.

  • P-17: Listen and Tell: Acoustic Scene Caption Generation using Deep Learning

    Michio Iwatsuki, Yui Sudou, Katsutoshi Itoyoma, Kenji Nishida, Kazuhiro Nakadai
    (Tokyo Institute of Technology)

    This paper proposes a deep learning model called Listen and Tell that automatically generates natural language captions explaining an acoustic scene including environmental sound signals. This model takes an acoustic signal as an input and outputs the corresponding natural language caption that describes acoustic scenes such as the type and timing of each sound source included in the scene. The model consists of encoder and decoder blocks. In the encoder block, the acoustic signal included in the input scene is first reformed into a temporal sequence of Mel-spectrograms with fixed size by short time Fourier transform. Secondly, each spectrogram is converted into a feature vector using a Convolutional Neural Network (CNN). Finally, a temporal sequence of feature vectors is fed into a Recurrent Neural Network (RNN) and a compressed vector is outputted as intermediate representation of the scene. In a normal encoder-decoder model for image captioning, a CNN is simply adopted for the encoder. To deal with acoustic signal which is originally 1D temporal information with a variable length, RNN and the fixed-size spectrogram representation are newly introduced to the encoder. The decoder block converts the compressed vector into a natural language caption using another RNN. To validate the proposed model, we created a dataset containing pairs of an acoustic signal and the corresponding caption. Each acoustic signal was generated by concatenating a few sound sources, and the corresponding caption describes the types of the sound sources and their order. After that, the proposed model was trained with the dataset. Experimental results showed that the proposed model achieved 71.6% in terms of a type-order success rate which indicates that the types of sound sources and their order are completely matched.

  • P-18: news2meme: A cross-modal retrieval framework based on word subspace for automatic content generation from news

    Erica K. Shimomoto, Lincon S. Souza, Bernardo B. Gatto, Kazuhiro Fukui
    (University of Tsukuba)

    Internet users engage in content creation by using various media formats. One of the most popular forms is the internet meme, which often depicts the general opinion about events with an image and a catchphrase. In this work, we propose news2meme, a framework for automatically generating memes from a news article, where we aim to match words and images efficiently. We approach this as two multimedia retrieval problems with the same input news text: 1) An image retrieval task where the output is a meme image; 2) A text retrieval task where the output is a catchphrase. These two outputs are combined to generate the meme for the news article. The main challenge is to perform cross-modal multimedia retrieval (get an image from a text). To solve this, we introduce the concept of word subspace, which can represent the intrinsic variability of features in a set of word vectors, contributing to contextualize the meme within the news. We represent texts and catchphrases as sets of word vectors through the word2vec representation. To handle images similarly, we extract sets of tags from the images using a deep neural network. These tags are then translated to word vectors in the same vector space through word2vec. Finally, we represent the sets of word vectors as word subspaces. Through word subspaces comparison, we can directly compare images and texts, making retrieval across media formats possible. A preliminary experiment was performed to evaluate our framework.

  • P-19: Sequence Generation by Sequential Time-Point GAN

    Hayato Futase, Tomoki Tsujimura, Makoto Miwa, Yutaka Sasaki
    (Toyota Technological Institute)

    In recent computer vision, generative adversarial nets (GAN) has a great success in image generation. Recent GANs often generate a series of images with Recurrent Neural Networks (RNN), which can express sequential information. However, such GANs cannot generate an image at a particular time. We propose a new GAN model, namely Sequential Time-Point GAN (STP-GAN), which produces the continuous changes of images using pairs of images. The model uses fully convolutional neural networks to represent images, and it shares latent vectors to represent the series of images and generates an image at a specific timing using a label to represent time. The model is composed of three networks: an encoder, a decoder, and a discriminator. The encoder estimates a latent vector from the two images on the same process. The decoder generate images at a specific time step from the estimated latent vector. The discriminator discriminates the generated image sequence from the actual image sequence. Our experiments using sequential data on forge processing show that STP-GAN significantly reduces the volume variance compared to conditional GAN without considering sequential changes. This shows that STP-GAN has the potential to be great at predicting next and intermediate images.

  • P-20: Character Spotting in Japanese Historical Documents by Deep Learning

    Anh Duc Le
    (Center for Open Data in the Humanities, ROIS)

    With the development of open data in humanities, an enormous amount of historical documents has been available electronically on the Internet for humanities researchers. Such as large documents can be accessed efficiently if researchers can search and extract necessary text. The traditional approach for this task is making an index for documents manually. Since the manual approach is expensive, automatize indexing process will reduce costs. In this research, we aim to propose a new method to spot keywords effectively on modern Japanese magazines. For Japanese documents which have a complex layout and a large vocabulary, the state of the art methods for word spotting such as PHOC descriptors, PHOCNet, and HWNet are inappropriate, since they require that documents have been segmented into words. Moreover, the accuracy of OCR for modern Japanese magazines is still low. It is very challenging to make indexing from OCR results. To overcome the above challenges, we propose a new method for keyword spotting, which predicts locations of an input keyword from document images without any pre-processing. The network is inspired by Single Shot Detection on object detection and Connectionist Text Proposal Network on text detection. The network has three parts: features extraction, keyword embedding, and bounding box regression. For feature extraction, we employ VGG16 to extract features from an input image. Then, we employ a Bidirectional Long Short Term Memory (BLSTM) to explore meaningful context information. For keyword embedding, we convert a keyword character to an embedded vector. The embedded vector is concatenated with each extracted features. Finally, the bounding box is predicted from the concatenated vector. This research is work-in-progress. The initial experiments showed that our proposed network is able to detect a small set of 50 characters. We plan to extend the character set to 3000~5000 characters.

  • P-21: PonNet: Object Placeability Classifier for Domestic Service Robots

    Angelica Nakayama, Aly Magassouba, Komei Sugiura, Hisashi Kawai

    Placing objects is a fundamental task for domestic service robots (DSRs). Most conventional methods detect free areas but do not predict the physical placeability (likelihood of success). Predicting placeability is challenging because it depends on the physical properties of the robot hardware, destination, obstacles, and the target object. Addressing this, we developed a CNN network with an attention mechanism to predict placeability on different concrete areas of the image, based on RGB-D images. We extend the method with GAN based data augmentation. Experimental results show that our approach significantly improved accuracy compared with baseline methods.

  • P-22: Which parts of the brain circuit are responsible for attentional mechanisms in machine learning ?

    Hiroshi Yamakawa
    (The University of Tokyo/The Whole Brain Architecture Initiative)

    The effectiveness of the attention mechanism recently has become obvious for natural language processing by machine learning such as Transformer in machine translation and BERT and GPT-2 in language understanding. The attention mechanism here is a function that sets key and value so as to draw a dictionary and fetches the value of key near query. This mechanism is also often used in visual question tasks, such as CLEVR Dataset, which provide verbal answers about the physical appearance and relations on the images. These verbal abilities possessed by human beings should be implemented in neural circuits mainly include the neocortical circuit. On the other hand, at present, the computational understanding of neocortical neural circuits is mainly based on Predictive Coding. Therefore, many proposed models focus on state, prediction, and the prediction error. However, little has been done about the computational model of the local circuit of the neocortex about these attention mechanisms. In this poster presentation, we will investigate the possibility that such a dictionary-like mechanism is realized by the pathway connecting neocortex through the thalamus core. Finally, we propose hypotheses of attention mechanism which is grounded on the neocortical local circuit. By this work, if the understanding of these mechanisms in the brain can be advanced, it can not only be useful for construction of brain-inspired AI but also may give a few hints to natural language processing models.

  • P-23: A neural network that processes time series data in which element data is linked to symbols

    Seisuke Yanagawa

    For animal, even in the early stages of evolution, processing and learning of time series data is essential. Eating behavior can be said to be the most basic time series data processing. In addition to having the ability to process time series data, we link words such as ""food"", ""grab"", and ""chew"" to elements of time series data. These may be voices, gestures or figures (the most complex example is a string), collectively referred to as symbol. By using time series data consisting of symbols, we became able to tell what happened to distant companions. This paper presents a neural network in which units for processing time series data are connected hierarchically. The neural network recognizes the context structure inherent in the input and performs hierarchical processing. The area in which the most activated unit exists moves as the processing progresses. In addition to the area involved in the input and output, the area that recognizes and generates symbols linked to the input and output data is activated. That is, there is a region reflecting the behavior of the symbol corresponding to the region reflecting the behavior of the thing, and both regions work together while processing time series data. This process does not exist for animals without language. The simultaneous activation of units contained in different regions causes the creation of new bonds between regions and the strengthening of bonds by learning gives rise to behavioral isomorphism. The behavior of mirror neurons can be explained by the isomorphism of this behavior. The neural network described in this paper is described mathematically by deducing the findings of neuroscience based on Hebb rule.