General Operation

FullText Search implemented in version 0.6.12. FullText Search of Cassandraemon is very simple. You only set attribute to properties, you can search document.

class Product
{
	[DocumentId]
	public int ID { get; set; }
	
	public string Name { get; set; }

	[DocumentField]
	public string Description { get; set; }
}


using(var context = new CassandraContext("localhost", 9160, "KeySpace1"))
{
	/****************/
	/* Create Index */
	/****************/
	var product = new Product { ID = 1, Name = "Product1", Description = "This is very useful product" };
				
	context.InsertFullTextIndexOnSubmit(product, new NormalAnalyzer(), "FullTextIndex");
	context.SubmitChanges();
				
	/***************/
	/* Query Index */
	/***************/
	var query = from x in context.SuperColumnList
		    where x.ColumnFamily == "FullTextIndex" &&
			  x.Key.Match(new NormalAnalyzer(), "Product") &&
			  x.SuperColumn.In("Description")
		    select x.ToFlatNameList<int>();
				
	foreach(List<int> idList in query)
	{
		foreach(var id in idList)
		{
			Console.WriteLine("ProductID = " + id);
		}
	}
}

Setting

First, You should write ColumnFamily setting for fulltext search. ColumnFamily Name is free. ColumnType should be set 'Super'. CompareWith should be set 'UTF8Type'. CompareSubcolumnsWith should be set 'BytesType'.

	<ColumnFamily Name="FullTextIndex"
		ColumnType="Super"
		CompareWith="UTF8Type"
		CompareSubcolumnsWith="BytesType"
		Comment="This is ColumnFamily for fulltext index" />

Attribute

Cassandraemon provide two attribute class.

Cassandraemon.FullText.DocumentIdAttribute

You can set DocumentIdAttribute to property. You can set one DocumentIdAttribute per one class. The property set DocumentIdAttribute come in document id. You run fulltext search then Cassandraemon return list of document id.

Cassandraemon.FullText.DocumentFieldAttribute

You can set DocumentFieldAttribute to property. You can set some DocumentFieldAttribute per one class. The property set DocumentFieldAttribute come in target of to create index.

Analyzer

Analyzer class provide word split algorithm. Cassandraemon prepare two default Analyzer.

Cassandraemon.FullText.NormalAnalyzer

NormalAnalyzer split sentence per english word.

This is useful product.  ->  [ this, is, useful, product ]

Cassandraemon.FullText.NGramAnalyzer

NGramAnalyzer split sentence per definite number character.

If you specify default constructor, default max char size 2 and default min char size 1. { new NGramAnalyzer() }
This is useful product.  ->  [ th, hi, is, us, se, ef, fu, ul, pr, ro, od, du, uc, ct, t, h, i, s, u, e, f, l, p, r, o, d, c ]

If you specify constructor arguments max char size and min char size. { new NGramAnalyzer(3, 2) }
This is useful product.  ->  [ thi, his, use, sef, efu, ful, pro, rod, odu, uct, th, hi, is, us, se, ef, fu, ul, pr, ro, od, du, uc ]
! Key string of Cassandra is trimmed at version 0.6. So, you can't do phrase search with NGramAnalyzer. If you want to do phrase search, you should use version 0.7.

Create Index

You call CassandraContext.InsertFullTextIndexOnSubmit method then fulltext index is created. InsertFullTextIndexOnSubmit take three arguments. First argument is object that have properties applied DocumentIdAttribute and DocumentFieldAttribute. Second argument is analyzer that define word split format. And third argument is columnfamily name.

class Product
{
	[DocumentId]
	public int ID { get; set; }
	
	[DocumentField]
	public string Name { get; set; }

	[DocumentField]
	public string Description { get; set; }
}


using(var context = new CassandraContext("localhost", 9160, "KeySpace1"))
{
	/****************/
	/* Create Index */
	/****************/
	var product1 = new Product { ID = 1, Name = "First Product", Description = "This Product is useful product" };
	var product2 = new Product { ID = 2, Name = "Second Product", Description = "This Product is simple product" };
				
	context.InsertFullTextIndexOnSubmit(product1, new NormalAnalyzer(), "FullTextIndex");
	context.InsertFullTextIndexOnSubmit(product2, new NormalAnalyzer(), "FullTextIndex");
	context.SubmitChanges();
}

If you run above logic, Fulltext index is stored as follows. Splited word is stored in key. Name of Property applied DocumentFieldAttribute is stored in SuperColumn. Value of Property applied DocumentIdAttribute is stored in Column Name. And word appearance position is stored in Column Value as List<int> class.

FullTextIndex : // ColumnFamily
{
	first : // Key
	{
		Name : // SuperColumn
		{
			1 : [ 1 ]
		}
	}
	is : // Key
	{
		Description : // SuperColumn
		{
			1 : [ 2 ],
			2 : [ 2 ]
		}
	}
	product : // Key
	{
		Name : // SuperColumn
		{
			1 : [ 1 ],
			2 : [ 1 ]
		},
		Description : // SuperColumn
		{
			1 : [ 1, 4 ],
			2 : [ 1, 4 ]
		}
	}
	second : // Key
	{
		Name : // SuperColumn
		{
			2 : [ 1 ]
		}
	}
	simple : // Key
	{
		Description : // SuperColumn
		{
			2 : [ 3 ]
		}
	}
	this : // Key
	{
		Description : // SuperColumn
		{
			1 : [ 0 ],
			2 : [ 0 ]
		}
	}
	useful : // Key
	{
		Description : // SuperColumn
		{
			1 : [ 3 ]
		}
	}
}

Query Index

You want to query index, you have only to write as follows. You can specify SuperColumnList only to returning data. Key point of this LINQ is Match method. Match method have two arguments. First argument is analyzer that you specify in insert. Second argument is search query. You can specify search query, and (one two), minus (one -two), phrase ("one two"). But phrase search is not available in version 0.6. Now then If you search index by Description property only , you should narrow down SuperColumn.In("Description"). Otherwise you search index by Name and Description property, Specify SuperColumn.In("Name", "Description").

using(var context = new CassandraContext("localhost", 9160, "KeySpace1"))
{
	var query = from x in context.SuperColumnList
		    where x.ColumnFamily == "FullTextIndex" &&
			  x.Key.Match(new NormalAnalyzer(), "Product -useful") &&
			  x.SuperColumn.In("Description")
		    select x.ToFlatNameList<int>();
				
	foreach(List<int> idList in query)
	{
		foreach(var id in idList)
		{
			Console.WriteLine("ProductID = " + id);
		}
	}
}

Delete Index

You call CassandraContext.DeleteFullTextIndexOnSubmit method then fulltext index is deleted. DeleteFullTextIndexOnSubmit take three arguments same as insert.

using(var context = new CassandraContext("localhost", 9160, "KeySpace1"))
{
	context.DeleteFullTextIndexOnSubmit(product, new NormalAnalyzer(), "FullTextIndex");
	context.SubmitChanges();
}

Update Index

Update operation is simply delete -> insert. Notice, you should submit after delete operation, because insert and delete at one time then butch_mutate api behave strange.

using(var context = new CassandraContext("localhost", 9160, "KeySpace1"))
{
	context.DeleteFullTextIndexOnSubmit(oldProduct, new NormalAnalyzer(), "FullTextIndex");
	context.SubmitChanges();
	context.InsertFullTextIndexOnSubmit(product, new NormalAnalyzer(), "FullTextIndex");
	context.SubmitChanges();
}

Original Analyzer

If you create class implemented Cassandraemon.FullText.IAnalyzer, you can split word original way. IAnalyzer have two method to override.

Dictionary<string, List<int>> MakeRegsitData(string text)

MakeRegistData method make data for to register. Return value is Dictionary that have word in key, and have appearance position list in value.

List<string> SplitQueryPhrase(string phrase)

SplitQueryPhrase method split query phrase. Example, you specify query to (one -two "three four"), SplitQueryPhrase method is called three times then argument is passed "one", "two", "three four". SplitQueryPhrase method assume to split these phrase, and create final search word. This method is useful in complex search like ngram.

Last edited Mar 1, 2011 at 6:00 PM by sabro, version 11

Comments

No comments yet.