top of page

TexGraph - An improved text representation for GNNs

If you're a CXO, founder or investor - follow me on LinkedIn & Twitter, or join my newsletter on my website here. I share latest simplified AI research and tactical advice on building AI products.

Please note this is my original research work. Do not use without authorization citation.


Currently popular models use sequential representations of text for learning. However these are inherently unsuitable for GNNs. Also, semantical connections between different words is not leveraged.

Here I propose a stem based text representation to enhance performance of GNNs. Different stems are added as root nodes in the graph and derived words are added as connected extension nodes.

This representation has benefit that learning a stem accelerates learning for all derived words and related stems and their derivations, thereby boosting training in time and accuracy compared to sequential representations.

Also, this representation is well suited for graph learning models.


Every word originates from a stem and many stems are derived from other stems. Thus we can visualise entire language as a graph of connected stems and their derivations. And that is how human brain learns and processes language.

Here I propose an approach (TexGraph) which tries to emulate the same model. A given body of text is converted to a graph of stems and their derived words. Different stems related to each other are connected directly with relation represented in edges of graph. Related stems are connected closely whereas unrelated ones are placed distant to each other.



bottom of page