While looking for something quick to do during a brief internet outage, I wrote some code to convert to and from Roman numerals. Once things were back up I searched for equivalent code, but only found stuff that was multiple pages long, limited the range of what it could convert, or both. I figured I might as well share what I came up with:
function romanize (num) { if (!+num) return false; var digits = String(+num).split(''); var key = ['','C','CC','CCC','CD','D','DC','DCC','DCCC','CM', '','X','XX','XXX','XL','L','LX','LXX','LXXX','XC', '','I','II','III','IV','V','VI','VII','VIII','IX']; var roman = '', i = 3; while (i--) roman = (key[+digits.pop() + (i * 10)] || '') + roman; return Array(+digits.join('') + 1).join('M') + roman; } function deromanize (str) { var str = str.toUpperCase(); var validator = /^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/; var token = /[MDLV]|C[MD]?|X[CL]?|I[XV]?/g; var key = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,I:1}; var num = 0, m; if (!(str && validator.test(str))) return false; while (m = token.exec(str)) num += key[m[0]]; return num; }
How would you rewrite this code? Can you create a shorter version?
This is a pretty common solution to the problem (just to romanize). I haven’t tested if it actually works but overall the algorithm is pretty simple and easy to port to any language.
By the way, your ‘deromanize’ is actually very easy to follow. Congrats.
function romanize(num) { IX:9,V:5,IV:4,I:1},
var lookup = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,
roman = '',
i;
for ( i in lookup ) {
while ( num >= lookup[i] ) {
roman += i;
num -= lookup[i];
}
}
return roman;
}
Another solution for
romanize
function can befunction rome(N,s,b,a,o,t){
t=N/1e3|0;N%=1e3;
for(s=b='',a=5;N;b++,a^=7)
for(o=N%a,N=N/a^0;o--;)
s='IVXLCDM'.charAt(o>2?b+N-(N&=~1)+(o=1):b)+s;
return Array(t+1).join('M')+s;
}
The above algorithm is based on one found on polish javascript group ( http://groups.google.com/group/pl.comp.lang.javascript/msg/c8991a94f77c98ce ). In the original function ‘N’ can be 0-3999. I just modified it to handle bigger numbers.
Both algorithms are great, for opposite reasons… Iván’s for its simplicity, and Rafael’s for its dizzying complexity. (Obviously, both are nice and short.)
When I posted this I assumed
romanize
was the weaker of the two functions. Anyone have another take on thederomanize
function?Well, without trying to validate the input string I guess the following is quite easy to understand deromanize algorithm. It will traverse the roman literal from right to left, adding by default and subtracting the letter equivalent value if it’s lower than its immediate sibling.
function deromanize( roman ) {
var roman = roman.toUpperCase(),
lookup = {I:1,V:5,X:10,L:50,C:100,D:500,M:1000},
arabic = 0,
i = roman.length;
while (i--) {
if ( lookup[roman[i]] < lookup[roman[i+1]] )
arabic -= lookup[roman[i]];
else
arabic += lookup[roman[i]];
}
return arabic;
}
Nice. It’s always interesting to see alternative approaches.
One thing though… To make it work cross-browser you’ll need to either convert
roman
to an array (roman = roman.toUpperCase().split("")
) or use thecharAt
method (roman.charAt(i)
instead ofroman[i]
). Treating strings as arrays like that is not part of the ES3 spec (although I believe it’s proposed for inclusion in ES4) and doesn’t work in IE, unfortunately.Kris Kowal emailed me the following two versions of
deromanize
, both of which use an approach similar to Iván’s but take advantage of Kris’s Chiron JavaScript library and a heavy dose of functional programming.include('boost.js');
var romans = dict({M:1000, D:500, C:100, L:50, X:10, V:5, I:1});
this.decode = function (roman) {
return (
add(
eachIter(roman.toUpperCase(), romans.get),
[0]
)
.to(edgesIter)
.eachArgsIter(function (x, y) {
return x < y ? -x : x;
})
.sum()
)
};
A variation on the same thing:
include('boost.js');
var romans = dict({M:1000, D:500, C:100, L:50, X:10, V:5, I:1});
this.decode = function (roman) {
roman = each(roman, romans.get);
return (
range(getLength(roman))
.eachIter(function (i) {
return [roman.get(i), roman.get(i + 1, 0)];
})
.eachArgsIter(function (x, y) {
return x < y ? -x : x;
})
.sum()
)
};
In both cases I edited the inner
return
to use the ternary?:
operator, because at least here I find that easier to read thanif
/else
. From my perspective, these are probably the most interesting (but least practical, if for no other reason than the added function call overhead) of the implementations so far.If the idioms are foreign to you, check out the Chiron source to figure out how it works.
This is a variation on Ivan’s but I think splitting into an array can actually simplify the code:
function deromanize( roman ) {
var roman = roman.toUpperCase().split(''),
lookup = {I:1,V:5,X:10,L:50,C:100,D:500,M:1000},
num = 0, val = 0;
while (roman.length) {
val = lookup[roman.shift()];
num += val * (val < lookup[roman[0]] ? -1:1);
}
return num;
}
There’s not the same level of validation as the original but I like it for its simplicity.
Very nice post, although I can’t imagine where I would need roman numbers anymore. Anyone wonder how the Romans actually read those numbers? I mean, we read every digit and say something about it. How did they do it?
BTW, I didn’t know about the +num test for numerical values. Always used parseInt. Thanks!
@Snook, IMO that’s a fairly significant improvement. Since the regex-based validation can easily be added in, it looks like your
deromanize
is the one to beat.@Siderite, you might have already realized this, but the unary
+
operator andparseInt
are not equivalent.+
can convert strings to numbers, and returnsNaN
if the element cannot be converted.parseInt
(which takes an optional second argument for the radix) does the same thing, but also extracts leading numbers from strings. E.g.,parseInt("12x")
returns12
, while+"12x"
returnsNaN
. Additionally,parseInt
and+
make different assumptions about the radix when there’s a leading zero.+"012"
returns12
, butparseInt("012")
returns10
. The leading zero causesparseInt
to treat it as an octal number in probably all browsers, despite octals being summarily deprecated in ES3. Of course, you can useparseInt("012",10)
to get around that.Incidentally, that explains why I’m using
String(+num)
. The plus sign is there to remove leading zeros if the number is received as a string. I probably wouldn’t bother accounting for that case here if it took more than an extra character.This blog does in fact appear to be badass.
IMO, the while loop can be easily avoided
function derome(N){ I:1}[i];
if(!/^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/.test(N))
throw new Error('Incorrect roman number');
var r=0;
N.replace(/[MDLV]|C[MD]?|X[CL]?|I[XV]?/g,function(i){
r+={M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,
});
return r;
}
PS. When validating the string, IMO, the function should throw an Error, not return false. Making the function return false, makes a PHP-style function out of it, rather than “classic”-JavaScript one.
Or, we can return Number.NaN
if(!/^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/.test(N))
return Number.NaN; // return +'a' if we want to make it shorter ;-)
@Rafael, I agree that returning
NaN
or throwing an error would be better than returningfalse
.However, I don’t really see the point of replacing a
while
loop with areplace
loop. You’re essentially replacing this (stripped down) piece of my function:var r=0,m,k={...},t=/regex/g;
while(m=t.exec(N))r+=k[m[0]];
…with this:
var r=0;
N.replace(/regex/g,function(i){r+={...}[i];});
That saves four characters. Granted, I specifically asked for shorter versions, but IMO it’s the same thing in less lucid form. Additionally, recreating the lookup object during each iteration is going to add a little bit of overhead. The other changes you made cause the function to return
0
when passed an empty string, and not handle lowercase Roman numeral characters.If the whole point of the solution was its brevity, never mind all that 🙂 (but then you might as well remove the validation too).
BTW,
return NaN
is shorter thanreturn +'a'
. 😉I know, that my solution is only a bit shorter, but not very efficient. I also know, that recreating the object isn’t the best solution. But we can define (once again) a local variable in the scope of deromanize function. The main point of my algorithm was to show, that we can avoid using the while-loop. Only this. I just like to “play” with code.
PS. I didn’t know, that NaN is also a global object/value… I thought it’s a property of Number object. Guess, I have to dive more deeply into ECMAScript Spec.
Oh, and the problem while validating, it’s my omission, sorry.
if(!N||!/^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/i .test(N=N.toUpperCase()))
return NaN;
@Rafael:
“I just like to ‘play’ with code.”
Me too. 🙂
Although
while
/exec()
andreplace()
can often be used interchangeably, I tend to usewhile
/exec()
unless I’m actually replacing something in the string. It makes more sense to me and it’s more flexible.Regarding the empty string handling, that was easy to miss given that I didn’t comment my code or explicitly test for that case. You could also deal with it by simply adding
(?=.)
at the front of the regex.“Very nice post, although I can’t imagine where I would need roman numbers anymore. Anyone wonder how the Romans actually read those numbers? I mean, we read every digit and say something about it. How did they do it?”
Dear Siderite One might need to read them, e.g. on old buildings, in old books. You obviously do not. Er…they used Latin to read their numbers; this, being logical, is the reason they can be easily reduced to code.
Steve
Just to say thank you for the explaination of adding leading + to input strings to stop parseInt from treating numbers (parsed in as strings) with leading zeros as Octal – solved a tricky problem for me with just one character added to my code 🙂
Stuart Mail
Thanks you guys, I was looking for just that!
Ok – now I’m hooked on these teasers! : )
Quick variation on the 2 line makeChange algorithm to encode:
(does multi million conversions – though I can see how they’d read that fast enough to be of any use – can you picture the bailout being accounted for in roman numerals? How much graft do you think they’d be able to get away with using accounting sheets with:
all numbers in Ms: mdccccvMMDCCCLIX * mmmdccvMDCLIX ! : )
var n = [
// using lowercase to represent Unicode lined Roman Numerals
[1000000, “m”],
[ 500000, “d”],
[ 100000, “c”],
[ 90000, “cm”],
[ 50000, “l”],
[ 10000, “x”],
[ 9000, “xc”],
[ 5000, “v”],
[ 4000, “iv”],
[ 1000, “M”],
[ 900, “CM”],
[ 500, “D”],
[ 100, “C”],
[ 90, “XC”],
[ 50, “L”],
[ 40, “XL”],
[ 10, “X”],
[ 9, “IX”],
[ 5, “V”],
[ 4, “IV”],
[ 1, “I”],
[ 0]
];
var output = [];
var i = 0, x = 0;
function makeRoman(money) {
for (x = Math.floor(money / n[i][0]); x; x–)
output.push(n[i][1]);
return isNaN(n[i + 1]) ? makeRoman(money % n[i++][0]) : output;
}
makeRoman(17996).join(“”);
… and using the same table here’s the recursive decode:
var n = [
// using lowercase to represent Unicode lined Roman Numerals
[1000000, “m”],
[ 500000, “d”],
[ 100000, “c”],
[ 90000, “cm”],
[ 50000, “l”],
[ 10000, “x”],
[ 9000, “xc”],
[ 5000, “v”],
[ 4000, “iv”],
[ 1000, “M”],
[ 900, “CM”],
[ 500, “D”],
[ 100, “C”],
[ 90, “XC”],
[ 50, “L”],
[ 40, “XL”],
[ 10, “X”],
[ 9, “IX”],
[ 5, “V”],
[ 4, “IV”],
[ 1, “I”],
[ 0]
];
var output = [];
var i = 0, x = 0;
var arabic = 0;
function makeArabic(numbers) {
for (i = 0; numbers.length && n[i][0]; i++)
if (numbers.substring(0,n[i][1].length) == n[i][1]) {
arabic += n[i][0];
return makeArabic(numbers.substring(n[i][1].length));
}
return arabic;
}
makeArabic(“MCMXLIX”);
Adding this functionality for all Numbers.
Number.prototype.romanize = function()
{
var digits = String(this).split(“”),
key = [“”,”C”,”CC”,”CCC”,”CD”,”D”,”DC”,”DCC”,”DCCC”,”CM”,
“”,”X”,”XX”,”XXX”,”XL”,”L”,”LX”,”LXX”,”LXXX”,”XC”,
“”,”I”,”II”,”III”,”IV”,”V”,”VI”,”VII”,”VIII”,”IX”],
roman = “”,
i = 3;
while (i–)
roman = (key[+digits.pop() + (i * 10)] || “”) + roman;
return Array(+digits.join(“”) + 1).join(“M”) + roman;
}
I was just searching for a romanize/deromanize script, to be used into http://it.wikisource.org (just to avoid to rediscover the wheel…).
Can I use you base code in the top of the page? Obviously I’ll link author & source. I see that page is covered by a copyright, so asking your permission is needed.
Thanks for your attention!
Alex
@Alex Brollo, yes, that’s fine. Any of my own code on this blog is released under the MIT License, if not explicitly stated otherwise.
I got a shorter code here.
romanize = function(a,b,c,d,e){for(b=c=”,d=5;a;c++,d^=7)for(e=a%d,a=a/d^0;e–;)b=’ IVXLCDM'[(e>2?c+a-(a&=~1)+(e=1):c)+1]+b;return b}
document.write(romanize(2524))
this is still “polish algorithm” 😉
Thanks for all this, guys. It was a joy to read, plus exactly what I was looking for. I’m writing a game (just a home project) and wanted Roman numerals in it… and couldn’t be bothered to write a function to do this. Cheers.
Functional style, almost even readable… ? ; )
Please dear author could you kindly work me through your code, am a bit lost as to the logic from the while loop.
Thanks
I look forward to hearing from you.
I think your code is pretty and clean. Making it more shorter will only affect the readability.
As you know, the roman numerals are limited to 3999 but I have seen people going above that, do you think that approach is correct?