I'm currently trying to split a string in C# (latest .NET and Visual Studio 2008), in order to retrieve everything that's inside square brackets and discard the remaining text.
E.g.: "H1-receptor antagonist [HSA:3269] [PATH:hsa04080(3269)]"
In this case, I'm interested in getting "HSA:3269" and "PATH:hsa04080(3269)" into an array of strings.
How can this be achieved?
From stackoverflow
-
Split
won't help you here; you need to use regular expressions:// using System.Text.RegularExpressions; // pattern = any number of arbitrary characters between square brackets. var pattern = @"\[(.*?)\]"; var query = "H1-receptor antagonist [HSA:3269] [PATH:hsa04080(3269)]"; var matches = Regex.Matches(query, pattern); foreach (Match m in matches) { Console.WriteLine(m.Groups[1]); }
Yields your results.
chakrit : Do you find it awkward in 3.5 that MatchCollection enumeartor still returns Match as Object?chakrit : anyway... a better regex match might be \[([^\]]*)\] so as to be on the safe side :-)Konrad Rudolph : @chakrit: 1. Yes, but this cannot be changed for backwards compatibility reasons. Really a shame though. Microsoft should have the balls to do like Python 3: throw everything pre-2.0 out for good and introduce a breaking change. But this won't happen …Hal : Perfect! Thanks man, really appreciate it :)Konrad Rudolph : @chakrit: 2. This was indeed my first version (I usually always use explicit groups) but I reconsidered because that's wordier to express exactly the same pattern (for all practical purposes). There's really no risk here in using the more implicit character class along with a nongreedy quantifier. -
Err, how about regex split then?! Untested:
string input = "H1-receptor antagonist [HSA:3269] [PATH:hsa04080(3269)]"; string pattern = @"([)|(])"; foreach (string result in Regex.Split(input, pattern)) { Console.WriteLine("'{0}'", result); }
Alan Moore : You should have tested it. "([)|(])" matches ')', '|', or '('. You probably meant "(\[|\])", but that's wrong too; if you use capturing groups in the regex, the captured text is returned along with the other tokens, for a total of eight tokens. Try it here: http://www.myregextester.com/inDaz : Since the question was actually to use split, I thought I'd demonstrate a better solution with a link and a quick, untested sample, from where the user can use their initiative and solve the problem!
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.