Timestamp on batch_mutate

Jul 17, 2011 at 3:28 PM

I was reading your blog article about simultaneous insert and delete and I'm wondering, maybe it's because all mutations have the same timestamp.

I recently read Cassandra: The Definitive Guide and according to the book, Cassandra takes data with the most recent timestamp as the latest.

Maybe we should set timestamp on InsertOnSubmit/DeleteOnSubmit rahter than on SubmitChange, and it may change the situation better.

It maybe make the performance slightly worse as Cassandraemon needs to calculate timestamp on each InsertOnSubmit/DeleteOnSubmit, but I hope it won't be too bad.

Any opinions?

Jul 17, 2011 at 11:46 PM

Cassandra don't have transaction. So, We can't know which data is stored in same connection.

In case, I set same timestamp, I search data in same connection more easy. And I can delete invalid data when I find bad logic.

Reason of same timestamp is above. But your suggestion feel good. I am not sure which way to take.

Jul 19, 2011 at 10:03 AM

I created a UT to verify this. The test does:

1. Delete, insert, and submit. See if the data exists.

2. Insert, delete, and submit. See if the column is deleted.

With current build of Cassandraemon, the test fails. The column is always deleted.

If I modify Cassandraemon to set timestamp on InsertOnSubmit/DeleteOnSubmit, the test passes.

Somtimes it still fails, but if I look into timestamp in such cases, timestamps are the same, so it is an issue of resolution of TimeGenerator.GetUnixTime(). We could possibly solve this by using high resolution timer (Stopwatch.GetTimestamp()), but it's a separate issue.

Jul 19, 2011 at 10:04 AM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.